0
0

Hello, everyone. After wrapping a pretty big (34000 lines per languge) open world PS360 game, I thought I’d write up our experiences getting all that voice to fit in our memory/IO/quality budgets. I’m posting this in the FMOD Designer forum because Designer was our baseline for all audio, but a lot of this would be relevant even if you are using the raw Ex API.

[b:1ypga5ud]Event Overhead and Programmer Sounds[/b:1ypga5ud]
We started off (like many people do) with a big Voice fdp with an Event for every line in the game. That’s ok if you don’t have many lines, but the per-event memory overhead for even a few thousand lines is really signficant. The key observation is that even when you have tens of thousands of voice lines, you really only have a dozen or so unique Events.

The solution to this Event overhead is programmers sounds. The details of implementing this have been well covered elsewhere in the forums and docs but suffice to say it’s pretty easy, though you will probably need to add another piece to your audio pipeline to build the data structure which maps your in-game line identifier to the raw sound asset (like an fsb index or file path).

Hindsight being 20/20, I’d recommend that anyone who expects to have more than a thousand or so voice lines bite the bullet as early as possible and make the transition to programmer sounds. You’ll save a lot of time and heartache if you do.

[b:1ypga5ud]FSB Generation[/b:1ypga5ud]
Once you’ve gotten your Event memory in line, the next thing you’re going to run into is how to build your voice FSB(s) efficiently. It’s very, very tempting if you are already using Designer to use Designer as your voice fsb build tool. This seems like a great way to save some build infrastructure, but trust me, you do not want to do this.

The problem is that every entry in an FSB has a header. By default, that header is about 80 bytes. Normally that’s totally fine, but with (tens of) thousands of voice lines, that overhead really adds up. This is especially problematic if you stream from an FSB (see below), an operation which will allocate all headers for the fsb for each stream you open.

The solution is to build your FSB using "small headers", which drops that overhead down to around 8 bytes (if memory serves). Small Headers requires that all items in the bank have identical sample rate, number of channels, etc. More importantly, you cannot make small header fsbs in designer (though it’d be a great feature to add!). That means you’ll need to use fsbank to compile your voice FSBs. Fortunately, this can dovetail nicely with the additional build work you need to do to support programmer sounds.

[b:1ypga5ud]To Stream Or Not To Stream[/b:1ypga5ud]
Given the large amount of voice in our game and our expectation of very tight memory budgets, our original solution was to stream all voice. This worked ok, and might have been shippable, but the big problem was that we were still paying as much as 300KB per voice stream. On PS3 that memory had to come out of our very precisious main memory pool. It was also a drag that sometimes the latency on combat voice was pretty noticeable, especially player reactions during situations when the game was doing a lot of IO.

The solution we ended up using was to create an LRU cache (in RSX memory on PS3) for voice data. Whenever it was time to play voice, we’d stream the data into this cache and play it from there. With proper preloading, this scheme worked great. It also dovetailes very nicely with programmer sounds, since with programmer sounds you need to supply FMOD with FMOD::Sound pointers, it was easy to point them at LRU entries rather than the streams. The big upside of all this was substantially redeuced IO and a large reduction in main audio memory on PS3.

I also want to emphacize that our LRU implementation was very, very simple, but had a hit rate of around 75%. We didn’t even use a separate pool of memory. Whenever a voice was requested, we first looked for it in the LRU. If we found it, awesome. If not, we used FMOD::createSound() and kept track of the pointer and the in-memory size of that sound. The LRU had a max size, and a buffer size. Whenever our total voice allocation bled into the buffer, we’d start evicting the oldest sounds, one sound at a time. As long as the buffer was sufficiently large (1 MB in our case) there was always enough room even when a bunch of new voice was requested at once. There are certainly many improvements one could make to our implementation, but it’s telling that such a simple approach (implemented in about 2 days) worked so well.

  • You must to post comments
0
0

Thanks for that. We’re looking into improving our speech systems for our next iteration of a recently finished project. We used programmer Sounds from the start because, as you say, the event data overhead is enormous. Our big problem has been streaming things efficiently from optical (at the same time as the rest of the world is streaming) combined with a vast amount of concatenated speech so we’re looking into preloading and buffering options at the moment.

  • You must to post comments
0
0

[quote:7qbp1pze]The solution we ended up using was to create an LRU cache (in RSX memory on PS3) for voice data. Whenever it was time to play voice, we’d stream the data into this cache and play it from there. With proper preloading, this scheme worked great. It also dovetailes very nicely with programmer sounds, since with programmer sounds you need to supply FMOD with FMOD::Sound pointers, it was easy to point them at LRU entries rather than the streams. The big upside of all this was substantially redeuced IO and a large reduction in main audio memory on PS3.

I also want to emphacize that our LRU implementation was very, very simple, but had a hit rate of around 75%. We didn’t even use a separate pool of memory. Whenever a voice was requested, we first looked for it in the LRU. If we found it, awesome. If not, we used FMOD::createSound() and kept track of the pointer and the in-memory size of that sound. The LRU had a max size, and a buffer size. Whenever our total voice allocation bled into the buffer, we’d start evicting the oldest sounds, one sound at a time. As long as the buffer was sufficiently large (1 MB in our case) there was always enough room even when a bunch of new voice was requested at once. There are certainly many improvements one could make to our implementation, but it’s telling that such a simple approach (implemented in about 2 days) worked so well.[/quote:7qbp1pze]

+1 Rep
Thats a cracking suggestion!

  • You must to post comments
Showing 2 results
Your Answer

Please first to submit.