0
0

Hi all,
My aim is to create a function, which has to capture sound from the microphone and if a certain noise level is reached (for example somebody starts speaking) it should execute a callback (or simply return with some non-zero value).

Based on record sample and some others posted in this forum, I’ve written small app which captures sound using FSOUND_Record_StartSample and allocates recorded data into the tmp buffer using FSOUND_Sample_Lock.

But I can’t understand what is the meaning of the data I’ve locked in a buffer …
How can I determine the "noise level"?
For instance: I’m initializing fmod in FSOUND_MONO | FSOUND_16BITS mode.
than call
FSOUND_Record_StartSample(samp, TRUE);
followed by
FSOUND_Sample_Lock(samp, position, length, &ptr1, &ptr2, &len1, &len2);
Am I right, that every 2 bytes in the ptr1 (or ptr2) are sound values in a certian point of time and the bigger value there is – the louder the sound was?
Maybe there is a better algorithm for this kind of "noise-detection" task?

  • You must to post comments
0
0

[quote="Random":2xx6uyzt]Hi all,
My aim is to create a function, which has to capture sound from the microphone and if a certain noise level is reached (for example somebody starts speaking) it should execute a callback (or simply return with some non-zero value).

Based on record sample and some others posted in this forum, I’ve written small app which captures sound using FSOUND_Record_StartSample and allocates recorded data into the tmp buffer using FSOUND_Sample_Lock.

But I can’t understand what is the meaning of the data I’ve locked in a buffer …
How can I determine the "noise level"?
For instance: I’m initializing fmod in FSOUND_MONO | FSOUND_16BITS mode.
than call
FSOUND_Record_StartSample(samp, TRUE);
followed by
FSOUND_Sample_Lock(samp, position, length, &ptr1, &ptr2, &len1, &len2);
Am I right, that every 2 bytes in the ptr1 (or ptr2) are sound values in a certian point of time and the bigger value there is – the louder the sound was?
Maybe there is a better algorithm for this kind of "noise-detection" task?[/quote:2xx6uyzt]

You have on the right track. There 2 ways you can attempt this:

[1] Peak detection. This is the method you are thinking of. You could evaluate the sample value and if it goes over some threshold (defined by you). Remember also that audio wave has both + and – phase, you’ll need get use the absolute value of the sample data (ignore + or – signs).

You might find peak detection is generally pretty flaky in some applications…because you might find the threshold is triggered by very short spikes will very little power in the signal.

[2] RMS ‘power’. Another approach to evaluating signal strength is to calculate the RMS value. You could divide the audio wave into small windows (you already have a buffer) and calculate the root mean square. This will calcuate the average power of the signal in the window, and use this value to represent the current strength of the signal (against your threshold).

  • You must to post comments
0
0

Thx, for the hint. May be you can also help me with another question …

1 sec. long record results in the 22110 samples allocated (44Kb at 22050 recordrate) in my tmp buffer.

I’ve found in another topic of this forum that for a voice detection I’d have to analize only limited bandwith interval (4kHz/8kHz)- is that right?
How should I check if the recorded sample value fits the interval or not?
How can I create something like a spectrum (FSOUND_DSP_GetSpectrum) using the data stored in my buffer?

  • You must to post comments
0
0

[quote="Random":1dcon0kx]Thx, for the hint. May be you can also help me with another question …

1 sec. long record results in the 22110 samples allocated (44Kb at 22050 recordrate) in my tmp buffer.

I’ve found in another topic of this forum that for a voice detection I’d have to analize only limited bandwith interval (4kHz/8kHz)- is that right?
How should I check if the recorded sample value fits the interval or not?
How can I create something like a spectrum (FSOUND_DSP_GetSpectrum) using the data stored in my buffer?[/quote:1dcon0kx]

There is good reason to narrow down the signal to sub-bands…at 22KHz sample rate, you can sample frequency up to ~11KHz…the human voice doesn’t really contain too much information outside the 200Hz to 2KHz range…I’m surprised someone mentioned 4Khz to 8 Khz, that is the high end of the human voice.

To implement it….

  • Break up the signal into small windows (you might use your whole buffer)
  • Calculate the spectrum for the window
  • Each element in the resulting array represents a "frequency bin"
  • Then simply add the values for each bin you are interested in, and compare it to your threshold.

Your threshold gate is now frequency dependent!

  • You must to post comments
Showing 3 results
Your Answer

Please first to submit.