Without using traditional ring-buffer audio streaming, I would like a means to swap out wave variations in a sound definition.

To put it another way, I’d like a means to have wave variations of a sound definition be disk-bound rather than memory bound, and without a constant streaming bandwidth requirement.

Traditional ring-buffer audio streaming is a solution, but some game projects are going to be very heavy with level geometry and texture streaming, so the disk bandwidth is not available consistently enough to use audio ring-buffer streaming pervasively. Wave swap works around this by being able to completely defer its refresh loads until level streaming bandwidth drops to a point where they can complete. Also audio streaming is great for long sounds such as music or lengthy dialog, but for tiny sounds it is kind of inefficient.

The way wave swap would work is: The wave variations in a sound definition are sorted into their own wave bank. This bank gets a new mode called wave swap and field for a number N to specify how many waves should be loaded at any one time. When the sound definition is loaded, fmod allocates only enough memory to hold the largest N waves of this wave swap bank. Now that the sound definition and some of its waves are loaded, it can be played immediately with no streaming hit or disk seek latency. Over time, as each wave variation in the sound definition is played, it gets marked as dirty and at 1/2 N dirty waves a low-priority request is made to swap the dirty waves with fresh wave variations from the bank, as per the sound def mode of random or sequential. So long as the sound definition is not played too frequently to exhaust the N number of waves that are currently loaded, it should be able to refresh variations in a manner that will not step on game level streaming very much. If the waves run out before being able to refresh, they could just be played again or simply dropped depending on a flag for this behavior.

For example, wave variations for crowd dialog in an open city game. When these waves are memory bound, you have a very small amount of space for things each individual can say. If the wave variations are disk-bound, they can say a lot more things and be less repetitive. If you are walking through the city, you’ll have more time to hear the crowd dialog variations while level streaming is less intensive. When you are driving through the city, you don’t have time to hear much crowd dialog, which is advantageous because level streaming is more intense the faster you move through the world.

Another example would be a sports game with spectator one-shots (whistles, shots, claps, etc) – these oneshots can take up less ram with wave swap but still have a huge amount of variations on disk but not have require tons of ring buffers and constant disk seeks between relatively small wave variations.


  • You must to post comments
Showing 0 results
Your Answer

Please first to submit.