0
0

Hello.What type of quantitative information makes up a song? I’m trying to analyse songs and create a list of numerical data (or signature) of the song. What information should I look?
Thanks

• You must to post comments
0
0

Hi guys.
Thank you for your replies. I’ll be using neural networks for the comparison aspect. I’m going to be using FMOD for the audio feature aspects as I know nothing about manually calculating FFT and such. That’s beyond me ๐ . I had a quick experiment and used GetSpectrum() to get the frequency. I stored it in a 512byte buffer. It retrieved the frequency. Am I correct in assuming that the information obtained is from that given moment in time and that I would need to obtain 512 (or less, for optimization) pieces of information every X seconds for the entire song?
Thanks again for the help, much appreciated.

• You must to post comments
0
0

[quote:2u4chf1h]Am I correct in assuming that the information obtained is from that given moment in time[/quote:2u4chf1h]
Yes you are correct, you will have to call getSpectrum many times while the song is playing.

-Pete

• You must to post comments
0
0

Thanks for the clarification Peter. I’ve been reading around, and I think 256 may be better than 512 as a lot of the information may be redundant (for my needs). Also, if I use nrt_nosound, is it possible for me to zip through a file and obtain all the frequencies? I assume I have to update the mixer?
In this context, of me trying to identify similarities in songs, would it make sense for me to use the average frequency per X seconds. (Average being calculated from each of the the 512/256 buffer). Or does that make the frequency information useless?

• You must to post comments
0
0

After reading around, I think Mel-Frequencies might just be what I need for my task. However, I need some help calculating it. Hopefully someone can guide me here. Here’s what I got from Wikipedia, and my psuedocode for it:
1) Take the Fourier transform of (a windowed excerpt of) a signal.
[code:1e7ubmf6]I use getSpectrum with the FMOD_DSP_FFT_WINDOW_TRIANGLE parameter
[/code:1e7ubmf6]
2)Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.

I go through the spectrum array and use this equation on each value:
[code:1e7ubmf6]mel = 1127.01048log e (1+f/700)[/code:1e7ubmf6]

Take the logs of the powers at each of the mel frequencies.
I then go through the new mel-array and take the log of each :
[code:1e7ubmf6]mLog[i] = (Math.log(melArray[i]))[/code:1e7ubmf6]

Take the discrete cosine transform of the list of mel log powers, as if it were a signal. (Then find amplitude of DCT result)
I’m not sure what I do here. How do I calculate the DCT?

I hope I’m on the right path there.
Thanks for any help.

• You must to post comments
0
0

Take a look at this library:

it might have the tools you require to extract the MFCC you want.

cheers,
Templar

• You must to post comments
0
0

Hi Templar,
Thanks for the link. I’m going through the documentations at the moment. As far as I can tell, it doesn’t load in MP3 files which is what I need for my project. Are there any tools (not necessarily open source) which outputs MP3 MFCC’s? I could use those for the time being, to test other aspects of the project.
Thanks

• You must to post comments
0
0

[quote="dcr":tow1pooj]Hi Templar,
Thanks for the link. I’m going through the documentations at the moment. As far as I can tell, it doesn’t load in MP3 files which is what I need for my project. Are there any tools (not necessarily open source) which outputs MP3 MFCC’s? I could use those for the time being, to test other aspects of the project.
Thanks[/quote:tow1pooj]

What do you mean by ‘outputs MP3 MFCC’? The MFCC data from an mp3 file?
If so, what you need to do is decode the mp3 into a pcm buffer (which fmod can do) and use the library (which I linked to before) to process the PCM buffer and calculate the MFCC.

Anyone else got any ideas?

cheers,
Templar

• You must to post comments
0
0

I’m using FMOD and C# (as I’m not overly familiar with C++). I’m trying to get the MFCC data from a bunch of mp3 files in order to see how similar they are to each other. I’m trying to use the FFT values obtained, but they don’t result in the best similarity values.
Thanks for the help.

• You must to post comments
0
0

try looking up: quefrency and cepstrum.

• You must to post comments
0
0

[quote="dcr":2xacjovj]Hello.What type of quantitative information makes up a song? I’m trying to analyse songs and create a list of numerical data (or signature) of the song. What information should I look?
Thanks[/quote:2xacjovj]

Matt’s suggestions are good. Mel-frequency cepstrum (MFC) is/was popular in voice recognition systems – and does a pretty good job of describing what is happening in the frequency spectrum at a given window of time. You can also use basic features such as fundamental frequency, harmonic spectroids, harmonic vs in-harmonic spectra, beat analysis, etc.

The key point is, you can’t simply measure these features once for the entire file. When last I looked at this stuff, you needed to repeatedly calculate each feature and place the results in a vector (to describe the change over time). Then to find similar songs, you would use some maths (above my understanding ๐ ) that would compare vectors of two songs and tell you how close the two vectors are.

A google search of ‘content based analysis’ and audio will definitely set you in the right direction….and you can pretty much ignore anything with my name on it!

cheers,
Templar

• You must to post comments
Showing 10 results