|
|
Subject:
Recognition of sounds from a group of known sounds
Category: Computers > Algorithms Asked by: carbon-ga List Price: $15.00 |
Posted:
19 May 2004 22:35 PDT
Expires: 27 May 2004 23:26 PDT Question ID: 349180 |
I'm working on writing an application that will recognize a sound from a group of sounds. I might start this hypothetical application in training mode, play the sound of a duck quacking into the microphone, and tell it that's a "duck". I might then play a short Dan Rather sample into the microphone, and tell it that's "Dan Rather". When this application is started in recognition mode, it will print "duck" if it hears something close enough to the duck quack, or "Dan Rather" if it hears something sufficiently close to the Dan Rather sample. So this application is different and simpler than voice recognition or voice to text. It just recognizes certain sounds from a group of sounds it has previously been trained to recognize. It has to pick these sounds out from a constant stream of audio that it doesn't know about, however. Imagine a computer with a microphone in a room where people are working and making normal noises. Someone walks up to the microphone and plays the Dan Rather sample into it. The program prints "Dan Rather". Hopefully the task is clear. What I want is a description of a procedure and the names of algorithms that will help me accomplish this. I'm not a bad programmer, so I don't want references to code; I need a mid-level description of how such a system would work, and some clarification on the harder pieces. In particular, I have the impression from a lot of stuff I've read that I should be using Fourier transforms or wavelet transforms to help me recognize the sounds. What do I give to the function that does the Fourier transform? What do I get back? Does this help me derive some kind of signature for the incoming sound that I can compare against signatures in my group of known sounds? That is, how does FFT help me determine how similar the incoming sound is to the sounds I know about? How do I deal with the fact that that I am trying to pick pieces out of a constant stream of audio? Those kinds of questions. Of course, if FFT is not the best way, I am completely open to other methods. I just need block diagrams and the names of algorithms. |
|
There is no answer at this time. |
|
Subject:
Re: Recognition of sounds from a group of known sounds
From: jamesjyu-ga on 27 May 2004 14:30 PDT |
It sounds like you want to recognize very broad types of sound (ie. not just voice). For voice recognition (for example) many techniques involve anaylzing signatures in the voice (formants, pitch, etc.) to judge whether two speakers are the same. In broader cases like yours, signatures are usually very inconsistent. Since you are only checking for a specific sound within a signal, your question is that of pattern recognition. There are some techniques that use the Fourier transform (via FFT), however, there may be a simpler method that you can try which involves correlation. http://mathworld.wolfram.com/CorrelationCoefficient.html Correlation basically calculates how related two signals are, and, in your case, can be simply calculated by multiplying the two signals (point-by-point) and summing up the result. For example, if you are searching for the duck sound (length M) within a signal X of length N, you would take a window of length M in X and multiply (point-by-point) it by the duck signal and sum the result. Then, you would need to slide over the window by one point, and then multiply again. Eventually, you will end up with N-M+1 correlation numbers. Now, you may treat these correlations as another signal. Local maximas in this signal represent the times in X that may contain the duck sound. Some kind of heuristical threshold must be chosen to determine whether the maximas indicate a duck sound. This will have to be set by hand. Keep in mind that this technique is not as robust as some frequency domain techniques. For example, if the person did not speak at the same rate, or there was a lot of noise in X, then this method will fail. But it is a start, and may be accurate enough for your purposes. |
Subject:
Re: Recognition of sounds from a group of known sounds
From: carbon-ga on 27 May 2004 23:26 PDT |
Thanks for the advice. I'll read. When I didn't get an answer immediately, I started researching, and got what I think is a pretty good block diagram description of how to go about doing what I want. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |