I’m creating an application that slows down speech (it’s linked to a graphics talking head) so that we can look at how the articulators work while keeping the pitch of the speech the same – I’m reasonably happy with the results from combining setFrequency with the pitchshifter dsp effect, however I now want to freeze/sustain the speech (i.e. in a waveform) when the user presses a key, i.e. at any random point, for an unspecified amount of time – this enables us still hear the correct pitch while zooming in on the lips/mouth/vocal tract etc.
Currently I’m using setLoopPoints which loops for a given number of PCM samples from the key press – the number of samples can be calculated using our own logic, however, I was wondering if there is a more suitable method which is built into FMOD which will essentially keep playing the last "note" when a key is pressed?
Just using loop points like this can result in very mechanical sounds, I wonder if some sort of reverb effect would smooth/soften this?
Thanks for you time,
- cjwuea asked 8 years ago
setLoopPoints for sustaining a section of the waveform sounds to me like the best way to achieve what you say, perhaps adjusting how many samples are looped will improve how it sounds. However I don’t have any further advise for making the sustained section sound less mechanical.
I think perhaps you may want to look at how time stretching algorithms work. For example PSOLA, which if I remember correctly is optimized for speech.
There will be a couple of reasons why your current method of setting loop points sounds robotic:
[list:2iud9rpc]- You are taking too few samples and trying to loop them. Voices are incredibly complex with varying frequencies, attacks, etc. So when you only take a short amount of samples, you lose the inherent quality of the voice.
– You are not setting your loop points at 0 level amplitude which will result in the repeated wave form having clipped noise
– You could be doing a straight loop as opposed to a ping-pong loop, etc. (might help a little).[/list:u:2iud9rpc]
One solution would be to create a DSP that does the following:
[list:2iud9rpc]- At the moment of pressing the "pause" button, determine the average pitch of the voice being played at that moment. This can be done with FFTs, etc.
– Determine the number of samples to loop based on the pitch, that way you can get a complete waveform from the length of the frequency.
– Copy that series of samples into a set of staggered ring buffers and then combine them by ramping up the volume of each buffer while decreasing the volume of the previous buffer.
– This is similar to how some time stretching algorithms work.
– Additional optimizations can be made by determining if you are at the onset of an attack.[/list:u:2iud9rpc]
Or you can do a hacky method as follows:
[list:2iud9rpc]- Do a search on the internet for a tool called "Paulstretch" it uses an algorithm that can stretches audio while maintaining pitch for extreme periods of time.
– Pre-process the audio to stretch it for a long time
– During normal playback play your regular audio
– When the user hit’s the "pause" button, switch from the normal playback to the stretched playback. Determine your loop point start from the current position of the normal audio, multiplied by the factor of how long you stretched it, and the end point from your existing length algorithgm multiplied again by the factor of how long you stretched the audio.[/list:u:2iud9rpc]
This will result in you looping over an already extremely stretched portion of the audio, so it should retain the original characteristics of the voice. Plus, since you are preprocessing the audio, instead of doing it live, you’ll probably get a performance gain.
Hope this helps!
- CuriousG answered 8 years ago
Please login first to submit.