Say i would want to make a LAN peer2peer voice chat program (under win32 using Borland C++ Builder), which isn’t that complex on data transmission (only using winsocket, one acts as a server, one as a host). The questions i’ve got are :
1.how to handle the lag-s
2.does fmod have any functions to avoid lags (as in buffering)? (i saw something like FSOUND_Stream_Net) if not…what’s the best solution for implementing something like that?
Is one possible solution sending bigger chunks of data … like say 64 kb, playing it all and waiting till the next 64kb is available ? And how is that done ? I saw FMOD has a builtin function to stream from the internet, but that’s only for http, and some fixed addres, i want to be able to send myself some data trough winsocket. Can i split a constantly recording sample into smaller samples of 64 k ? How ?
I realized i put too many question so I would be gratefull for links to documentation as well if my questions cover a too wide area.
- raul_ asked 14 years ago
Ok, so on your server the only FMOD stuff you need to do is mix the streams which you can do with FSOUND_DSP_MixBuffers. This function is fast but you’ll still experience delays/latency because of the network and encoding, though speex is probably pretty good for encoding seeing as how it’s optimised for speech. It wouldn’t be too hard to knock up a test case using FSOUND_DSP_MixBuffers and speex to see how much CPU time they’ll take, that way you can see if it’ll be feasible for your game.
This is an interesting topic that I wouldn’t mind researching a bit more. (possibly for FMOD4 when it’s a little more mature)
Basically, you could do it by using an FMOD dsp unit to read chunks of a sample that is currentlly recording (cf. the record example) – send these chunks over the network and feed them to a custom stream at the other end to play them back.
That’s fairly simple if you understand FMOD dsp units and streams and whatnot but the real hard part is, as you pointed out, the network side of things. How do you avoid lag? How do you minimise bandwidth usage? You’ll probably want to do your own buffering logic and also encode/decode the data in realtime to reduce the size of it.
It’s a reasonably advanced topic but there’s a fair bit of info on the ‘net about it. Make sure you understand FMOD streams and dsp units too. Oh, and if you do a good implementation of it then it’ll probably be worth money to someone cuz it ain’t trivial. 😀
Well…I’ll give it a try. I’m doing it for fun actually, just for me and a friend (we have a peer2peer LAN). At first, I’ll see if sending just RAW data (but lower frequency(~11050, mono) acts fine (I’ll do buffering on my own then), because being 2 of us in this LAN there’s a big chance of it working without compression. If not, I’ll consider some ogg-vorbis compression algorythm.
Unfortunately I’m a FMOD newbie. I just downloaded it 1 day ago…but i’ts not that hard to use at all, I like it.
- raul_ answered 14 years ago
I might be looking into replacing a really old 3D sound library from Wulfram, and am considering FMOD. One of the things I’m kind of looking to do is implement team / wingman chat.
I’ve not thought about it a great deal, but it would have to go something like where clients have to record encode/compress a single voice stream as fast as possible, send it to the server. The server would then have to decode/decompress each stream, and mix it with other voice streams, and send it on it’s way to whichever client wants to hear it.
I’d do this with a protocol that would allow packet loss… voice is realtime data, and there is no sense in delaying the whole stream just because a packet got dropped. So that also means that whatever the coding method used, it would have to deal with chunky data loss… (obviously at a cost to voice coherency).
I don’t know much about DSP and filters… so that would be a bit of a problem I guess. I’d hope that there would be some reasonably high-level functionality I could use from fmod that would allow me to make a client and a server that can do a lot of the stuff that needs to happen at the DSP/sound level.
The networking and client/server generic stuff is trivial to me.
Any thoughts on that?
For playback of data coming from the network you could use a user-stream (created with FSOUND_Stream_Create) with a stream DSP unit attached to it. As the data comes in, you just feed it into the stream via the DSP. Check out the samples/stream2 example program to see how that works.
You will need to pass in uncompressed PCM data to the DSP unit so you’ll have to decode the compressed stuff off the network by yourself. Also, FMOD is a library for playback only – it doesn’t do encoding – so you’ll also have to encode/compress the data yourself before sending it across the network.
Use google to check out blade_enc.dll, lame_enc.dll and maybe MAD (MPEG Audio Decoder).
Found a good voice codec here (I’ve not yet used it, but it looks promising): http://www.speex.org
The thing is, I’m thinking about how this is going to work on the server side… there are going to be various people joined or “tuned” to various channels.
Thus, on the server, I’d want to do something like so:
foreach voice input stream belonging to the group:
— mix the voice together with other voices
— encode the result and send it on it’s way
The problem with that is the time it takes to mix things. Say there are N active speakers (people talking in unison that we’d like to mix together), then the server needs to prepare N+1 voice streams (assuming there are more than N people in the group listening: (ie: more listeners than speakers). The point of this is that for someone that is doing the speaking, they should not get their own voice time-delayed coming back to themselves. That would be bad. So any speaker needs to get a stream back that is unique to the speaker. Thus, I’d need to do N + 1 mixing and encodings on the server side (well, that and N decodings too, to begin with).
It looks like speex can turn off a datastream when it detects no voice. There might be some problems with background noise (coming from the speakers themselves…)
Would be funny to experience a feedback loop (with network and client/server delays built in).
I’m guessing the act of mixing the speakers together, and then encoding the result may be expensive. Especially on a game process that’s trying to do a whole game at the same time.
I could restrict this so that only a single ‘speaker’ speaks at a time, but I’m unsure how I would choose between ‘collisions’. Perhaps the server side would detect when someone else is already speaking, and not allow other clients to send speach until it was determined that the active speaker was done. That would be, what I think is called, non-duplex type of operation.
Please login first to submit.