0
0

Hello,

I’m working on a project which needs vocal recognition. In order to have something functional, I plan to do the following:
– record an input sound in a buffer using fmod
– pass the buffer to [url=http://cmusphinx.sourceforge.net/:yy2qerct]PocketSphinx[/url:yy2qerct] in order to treat the content

The thing is: PocketSphinx only accepts a specific sound format. According to the [url=http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#decoding_a_file_stream:yy2qerct]documentation[/url:yy2qerct]:
[quote:yy2qerct]It needs to be a single-channel (monaural), little-endian, unheadered 16-bit signed PCM audio file sampled at 16000 Hz.[/quote:yy2qerct]

Therefore, I configured my new fmod sound as to make it match the specification. In my mind, it can all be done with the [b:yy2qerct]FMOD_CREATESOUNDEXINFO[/b:yy2qerct] structure:
[code:yy2qerct]
FMOD_CREATESOUNDEXINFO exinfo;
memset(exinfo, 0, sizeof(FMOD_CREATESOUNDEXINFO));
exinfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO);

exinfo.numchannels = 1; // "single-channel (monaural)" checked
exinfo.defaultfrequency = 16000; // "sampled at 16000 Hz" checked
exinfo.format = FMOD_SOUND_FORMAT_PCM16; // "unheadered 16-bit signed PCM audio file" checked

// 5 seconds buffer
exinfo.length = exinfo.defaultfrequency * exinfo.numchannels * sizeof(int16_t) * 5;
[/code:yy2qerct]

Below is a simplified version of my code:
[code:yy2qerct]
FMOD_SYSTEM *system;
FMOD_SOUND *sound;
FMOD_CREATESOUNDEXINFO exinfo;
unsigned int elapsed;
void *ptr1, *ptr2;
unsigned int len1, len2;

FMOD_System_Create(&system);
FMOD_System_SetDriver(system, 0);
FMOD_System_Init(system, 32, FMOD_INIT_NORMAL, NULL);
FMOD_System_CreateSound(system, 0, FMOD_2D | FMOD_SOFTWARE | FMOD_OPENUSER, &exinfo, &sound);

FMOD_System_RecordStart(system, 0, sound, 0);
for (;;) {} // Wait for: a user input / 5 seconds is elapsed, to stop recording. Store the elapsed time in milliseconds in the elapsed variable
FMOD_System_RecordStop(system, 0);

unsigned int length = exinfo.defaultfrequency * exinfo.numchannels * sizeof(int16_t) * (elapsed / 1000.);
FMOD_Sound_Lock(sound, 0, length, &ptr1, &ptr2, &len1, &len2);

// Relevant PocketSphinx call
ps_start_raw(pocket_sphinx_object, ptr1, len1, FALSE, TRUE);
[/code:yy2qerct]

My problem is: the PocketSphinx results is totally irrelevant when I proceed this way. But I know my sound is correctly recorded as I am able to hear myself using:
[code:yy2qerct]
FMOD_System_PlaySound(system, FMOD_CHANNEL_FREE, sound, 0, NULL);
[/code:yy2qerct]

So I have the following leads:
– even if I tried to set the correct sound settings, this is not enough. Something is missing in order to make PocketSphinx able to read it
– [b:yy2qerct]FMOD_Sound_Lock[/b:yy2qerct] doesn’t properly give me access to the raw PCM buffer
– or maybe an endianness issue, but it should be ok as I am testing on an OSX x86_64 (little-endian)

Do you guys have any idea?

Thanks for your time

  • You must to post comments
0
0

All your FMOD usage looks fine to me.

After a quick read of the PocketSphinx docs, it appears ps_process_raw expects a length in samples (not bytes).
The ‘len1’ returned from FMOD_Sound_Lock is in bytes, as it’s mono PCM16 simply divide by sizeof(int16_t).

  • You must to post comments
0
0

Hey Mathew,

It seems it was indeed the issue.

Thanks for pointing it! It was driving me crazy.

  • You must to post comments
Showing 2 results
Your Answer

Please first to submit.