Comparing similiar audio files, FFT?

Hello, I am trying to compare two similar audio files (WAV). From what i have read i need to sample both audio files at certain frequencies and run these through a FFT and then compare the results. Can anyone advise me if this is the correct approach and also describe the steps i need to take to get to the stage where I can compare the files. TIA, Kieran

Reply to
kieran
Loading thread data ...

That is one way to compare the files. But what are you trying to do with the comparison?

You need to do classical DSP work.

Use a low pass filter to prevent aliasing. Take a binary number of samples (ie: 128, 256, 512, 1024 ...) Run an FFT on the samples, this will give you frequency domain data from your time domain data. Each data point is refered to as a bin and the frequencies that fall into that bin depends on the clock frequency of the samples.

Plot the spectrum of the 2 audio files.

Even Excel can do FFTs but it is not obvious what it is doing unless you are familiar with FFTs Maybe there is some free FFT software you can grab.

Reply to
Steve

Take a look at

formatting link

... the one that tells the truth about green pens and all that!

Chris

Reply to
christofire

On a sunny day (Wed, 8 Oct 2008 19:01:32 +0100) it happened "christofire" wrote in :

Look like a copycat in 2008 of what I wrote around 2000 (Linux):

formatting link
I used this to cancel common background in translated tracks.

It also aligns, matches amplitude, and substracts. Wrote quite a few more audio utilities actually, most are here:

formatting link

You would still need to understand digital audio and audio in general to use these of course.

Reply to
Jan Panteltje

Maybe the other suggestions are good enough, personally i suspect that tempo adjusting software with at least one dial to keep them synchronized, put one signal in each ear and listen. The brains software is much better than any available package.

Reply to
JosephKK

You failed to indicate the criteria of the comparison. Just what in these files do you want to compare?

Reply to
miso

Hi TIA, This seems to be a good approach. What I am trying to do is to automate the comparison of audio files. The two files I will be comparing will be audio recorded from an IVR system. The first file will be a high quality recording, checked by ear, the second file will be recorded evey hour to ensure the IVR is working correctly, ie if the two files sound similarI can consider the IVR to be working. I will give this a go and let you know teh results. Thanks for your help, Kieran

Reply to
kieran

Could you put a test mode in your IVR? Perhaps have it respond with something easy to detect like DTMF?

....or perhaps figure out a way to subtract the one recording from the other and except for some gain adjust and phase offset the results should be a close to silence. Calculate the amplitude of the results and see that it is low.

Reply to
Steve

Well, yeah, but... what's the similarity criterion?

In some sense, an FFT will tell you the voice of the singer or the instrument(s) but might not distinguish multiple works of different composition performed on the same instrument. Similarly, a time/amplitude breakdown might pick up the 'Surprise' symphony easily from other works, but can't tell you whether it was performed by an orchestra or a kazoo band.

A two-minute selection from a CD has 10 million samples, and that means it selects a point in a 10-million-dimension vector space. What makes two such points similar?

Reply to
whit3rd

her

t

There is simple problem with this: there is no way to adjust the phase because phase only make sense in context of periodic signals. A time domain signal as above is not periodic, but one can pluck components from frequency domain from each signal and look at their phases.

In other words, if a speaker is offered $100US if s/he can create the same sampled digital signal, more or less, by speaking into IVR, such that only by shifting signal2 a bit relative to signal1 he is able to get the signals properly aligned for comparison, he will fail. The reason is that, even at the relatively low sample rate of 8kHz, no human is able to begin speaking just at the right instant, let alone control the physiology of speech path to generate more-or-less the exact same signal. Any attempt to find out when a signal begins is hopeless in the time domain. Is it the first non-zero sample? The second? Third? Is that noise or voice? Is it when the "hump" is really high? Almost really high? One cannot know.

This is classical problem in speech recognition and related areas. I responded to OP in comp.dsp with outline of what he needs to do:

formatting link

-Le Chaud Lapin-

Reply to
Le Chaud Lapin

I didn't think he was trying to use a human in this instance and that the IVR is playing the exact same speech each time. So would you not be able to do a cross-correlation?

There is simple problem with this: there is no way to adjust the phase because phase only make sense in context of periodic signals. A time domain signal as above is not periodic, but one can pluck components from frequency domain from each signal and look at their phases.

In other words, if a speaker is offered $100US if s/he can create the same sampled digital signal, more or less, by speaking into IVR, such that only by shifting signal2 a bit relative to signal1 he is able to get the signals properly aligned for comparison, he will fail. The reason is that, even at the relatively low sample rate of 8kHz, no human is able to begin speaking just at the right instant, let alone control the physiology of speech path to generate more-or-less the exact same signal. Any attempt to find out when a signal begins is hopeless in the time domain. Is it the first non-zero sample? The second? Third? Is that noise or voice? Is it when the "hump" is really high? Almost really high? One cannot know.

This is classical problem in speech recognition and related areas. I responded to OP in comp.dsp with outline of what he needs to do:

formatting link

-Le Chaud Lapin-

Reply to
Steve

Yes, I guess that would work too, as long as the signals are normalized first, as you pointed out in youir 2nd post.

You got me thinking about the pros and cons of the cross-correlation method versus mean-squared-error method, and minimum distance estimator.

-Le Chaud Lapin-

Reply to
Le Chaud Lapin

Hi all, thanks for your posts. They have helped me a great deal and have definatly steered me in the right dirtection. Some more info: I should have explained that I am comparing the same recording of the voice but the differences I am trying to identify are caused by interference from the mobile phone network. ie lost audio and noise. I will be listning to one of the samples (the master or reference), by ear to ensure the recording is clear and without interference. I will then record the same piece of audio at various times through out the day and compare it to the master. The comparison should identify which recordings are of high quality (low interference) and identify the recordings that are of low quality(lots of interference and lost audio). Kieran

Reply to
kieran

Ooh. In that case maybe you should look for audio forensics software. I hear diamond cut AC5 can be useful.

Reply to
JosephKK

This appears to be the successor to the produce i heard about:

formatting link

Reply to
JosephKK

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.