Google Answers: Recognition of sounds from a group of known sounds

View Question

Q: Recognition of sounds from a group of known sounds ( No Answer, 2 Comments )

Question

Subject: Recognition of sounds from a group of known sounds
Category: Computers > Algorithms
Asked by: carbon-ga
List Price: $15.00

Posted: 19 May 2004 22:35 PDT
Expires: 27 May 2004 23:26 PDT
Question ID: 349180

I'm working on writing an application that will recognize a sound from
a group of sounds.

I might start this hypothetical application in training mode, play the
sound of a duck quacking into the microphone, and tell it that's a
"duck". I might then play a short Dan Rather sample into the
microphone, and tell it that's "Dan Rather". When this application is
started in recognition mode, it will print "duck" if it hears
something close enough to the duck quack, or "Dan Rather" if it hears
something sufficiently close to the Dan Rather sample.

So this application is different and simpler than voice recognition or
voice to text. It just recognizes certain sounds from a group of
sounds it has previously been trained to recognize. It has to pick
these sounds out from a constant stream of audio that it doesn't know
about, however. Imagine a computer with a microphone in a room where
people are working and making normal noises. Someone walks up to the
microphone and plays the Dan Rather sample into it. The program prints
"Dan Rather". Hopefully the task is clear.

What I want is a description of a procedure and the names of
algorithms that will help me accomplish this. I'm not a bad
programmer, so I don't want references to code; I need a mid-level
description of how such a system would work, and some clarification on
the harder pieces.

In particular, I have the impression from a lot of stuff I've read
that I should be using Fourier transforms or wavelet transforms to
help me recognize the sounds.

What do I give to the function that does the Fourier transform?
What do I get back?
Does this help me derive some kind of signature for the incoming sound
that I can compare against signatures in my group of known sounds?
That is, how does FFT help me determine how similar the incoming sound
is to the sounds I know about?
How do I deal with the fact that that I am trying to pick pieces out
of a constant stream of audio?
Those kinds of questions.

Of course, if FFT is not the best way, I am completely open to other
methods. I just need block diagrams and the names of algorithms.

Answer

There is no answer at this time.

Comments

Subject: Re: Recognition of sounds from a group of known sounds
From: jamesjyu-ga on 27 May 2004 14:30 PDT

It sounds like you want to recognize very broad types of sound (ie.
not just voice).  For voice recognition (for example) many techniques
involve anaylzing signatures in the voice (formants, pitch, etc.) to
judge whether two speakers are the same.  In broader cases like yours,
signatures are usually very inconsistent.

Since you are only checking for a specific sound within a signal, your
question is that of pattern recognition.  There are some techniques
that use the Fourier transform (via FFT), however, there may be a
simpler method that you can try which involves correlation.

http://mathworld.wolfram.com/CorrelationCoefficient.html

Correlation basically calculates how related two signals are, and, in
your case, can be simply calculated by multiplying the two signals
(point-by-point) and summing up the result.  For example, if you are
searching for the duck sound (length M) within a signal X of length N,
you would take a window of length M in X and multiply (point-by-point)
it by the duck signal and sum the result.  Then, you would need to
slide over the window by one point, and then multiply again. 
Eventually, you will end up with N-M+1 correlation numbers.

Now, you may treat these correlations as another signal.  Local
maximas in this signal represent the times in X that may contain the
duck sound.  Some kind of heuristical threshold must be chosen to
determine whether the maximas indicate a duck sound.  This will have
to be set by hand.

Keep in mind that this technique is not as robust as some frequency
domain techniques.  For example, if the person did not speak at the
same rate, or there was a lot of noise in X, then this method will
fail.  But it is a start, and may be accurate enough for your
purposes.

Subject: Re: Recognition of sounds from a group of known sounds
From: carbon-ga on 27 May 2004 23:26 PDT

Thanks for the advice. I'll read. When I didn't get an answer
immediately, I started researching, and got what I think is a pretty
good block diagram description of how to go about doing what I want.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy