Google Answers: specific sound recognition

View Question

Q: specific sound recognition ( Answered 4 out of 5 stars

Question

Subject: specific sound recognition
Category: Science > Technology
Asked by: jberry-ga
List Price: $40.00

Posted: 15 Sep 2003 17:06 PDT
Expires: 15 Oct 2003 17:06 PDT
Question ID: 257111

I would like to find out what kind of a devices are available that can recognize or compare and match small prerecorded sound bits. For example: provide a recorded sample (> 1 second in duration) and the device would then scan and listen and when it hears an exact repeat of that sample it would signal. If there is not such a device available what would be required to make one. Thank you. jb-kaisertech
Clarification of Question by jberry-ga on 15 Sep 2003 17:07 PDT my question should have shown less than 1 second in duration (<1 second in duration) Sorry
Request for Question Clarification by sublime1-ga on 15 Sep 2003 23:16 PDT jb... I think it will benefit the research if you can specify what format of sound the device, or program, would be scanning, once it has been given the sample. Will it be scanning live audio in a room, or from a speaker, or will it be scanning a file format, and, if so, which one (wav, midi, aiff, etc). The better you can define its intended function, the easier it will be to determine if it exists.
Request for Question Clarification by hedgie-ga on 16 Sep 2003 03:58 PDT The phrase 'an exact repeat of that sample' needs clarification. In 'voice recognition' applications, which the commenter refers to 'match' means one thing. 'Same waveform means' something else when evaluating fidelity of a recording, (and is much easier). The 'same sound with different backgrounds' is still different task. If you explain the application, researcher may be able to guess what 'waveform metrics' one should select to define 'close' or 'almost exact' and then suggest a method. A computer (microprocessor) with A/D and a suitable program can do most of such comparisons. Do you want some references on that?
Clarification of Question by jberry-ga on 17 Sep 2003 21:54 PDT Sorry for the delay in this clarification. I would like to find a device into which I could say a word or two that it could thereafter recognize whenever spoken and either signal or count. Preferably it would be able to pick out the specific words in normal speech. I hope this helps. thank you. Jan
Request for Question Clarification by hedgie-ga on 18 Sep 2003 03:35 PDT Thanks for clarification. That is a Voice Recognition System as described e.g. by kik-ga in a comment. It may be what you want. Computer is a 'device'. It can be made small ( a chip) and it is easier if there is only one speaker. If what you have from kik is enough, you may close the question. If you want more details on 'device' please clarify further. (size of device, cost, number of speakers,...) hedgie
Clarification of Question by jberry-ga on 18 Sep 2003 20:44 PDT It doesn't really need to have a library, it has to be able to accept a sample sound or "control words" at any time; and it wouldn't need to have a speaker just a signaling device like a buzzer and a microphone. For example: I would like to be able to say "five ten" and it would either count or signal every time it heard that, but an hour later I might want it to listen for "seventeen fifty". It would also be good if it could differentiate between my voice and that of another person. I would also like to know if the device or "chip" can be used or configured to be used in small portable device that can be carried around easily. The ideal setup would be a small little device in your pocket (like a tiny voice recorder) with a microphone attached to your lapel. When ever or where-ever you wanted you could press a button and record a word or two and it would beep every time it heard that sound until you gave it in another sound to listen for. Thanks for your patience with me....I'm new at this. Jan

Answer

Subject: Re: specific sound recognition
Answered By: hedgie-ga on 20 Sep 2003 00:33 PDT
Rated: 4 out of 5 stars

You are welcome jberry I will give you some examples of Voice Command Recognition Technology - original speaker-dependent or speaker-independent voice recognition algorithm, as offered e.g. here: http://www.speechpro.com/eng/technologies/recognition.html The size of the vocabulary, the ability understand different speakers and to differentiate between them, tolerance for errors, range of speakers (children, people with accent..) determine demands on processing power of the CPU (processing unit) and memory of the device. That will determine size of the final device. Here, by 'speaker' we mean 'a person who speaks' , not a loadspeaker. Ability to understand numbers exists today in quite a small package. It used in mobile phone DiVo and described here http://www.speechpro.com/eng/products/divo.htm Here the vocabulary is quite small, ten digits + some controls It looks like you are using numbers. If you can express 17 50 as " one seven break five zero end " The task is much easier then if you need to say "seventeen" and device can be smaller. The Speaker Verification is a demanding feature. Complexity depends on error tolerance. If you can use password instead, task is much easier. http://www.speechpro.com/eng/technologies/restriction.html Here is a free to try program one can experiment with: IVOS - Intelligent Voice Operating System runs on a typical home computer http://download.com.com/3000-7239-10070918.html Here is commercial device people use to control TV and lights in the home http://www.smarthome.com/1470.html The microprocessor (popularly known as computer on a chip) can be build into a small portable device and used to activate devices, such buzzers or light indicators. There are many types, which vary in cost from few dollars to thousands. The power, ability to do complex tasks, depends on the cost. http://dictionary.reference.com/search?q=microprocessor Here is a paper from year 1998, called THE FUTURE IS HERE! http://www.dinf.ne.jp/doc/english/Us_Eu/conf/csun_98/csun98_013.htm Here is current, expensive EADL product Nemo, a portable device that accepts voice commands and turns them into signals that will control your home. http://www.enablemart.com/products_detail.asp?id=212 and here a list of such devices designed for Handicapped Independence http://polio.dyndns.org/chip/mobvrec.html This are few examples of this rapidly evolving technology. You may enter the terms below into a search engine to get more examples. Search Terms voice command control system microprocessor voice command control, portable device voice recognition hedgie
Request for Answer Clarification by jberry-ga on 21 Sep 2003 02:03 PDT The device I'm looking for (or maybe it is software that I'm looking for) doesn't need to ever "understand" anything, it just needs to scan and match short sounds. I don't want to control anything I just want to know how many times I make a certain sound. Maybe I would want it to let me know how many times an hour I clear my throat, or sniffle, or say a certain word. The answer given was close in that one of the links (to speechpro.com) has a product that can spot words in a word stream. It listens for words or phrases like "bin laden" in a news broadcast, but the input appears to be text input and is somewhat individual independent. Usually speech recognition software attempts to find ways to understand words spoken by anyone not just one individual and it seems that this kind of computing/software power isn't really needed at all for this application. The closest thing I found in the links you gave me were references to the cell phone voice record/recognize function where you input a phone number and then record the control command that it will recognize when heard and dial the number. Unfortunately it doesn't work when the word is part of a stream. Can you get any closer or direct me to additional research. ...compared to the monumental task of voice recognition this seems so easy but on closer examination I guess it isn't. Thanks for your efforts Jan Thanks for any additional input or direction for more research.
Clarification of Answer by hedgie-ga on 21 Sep 2003 03:38 PDT Jan Thanks for the clarification it helped a bit. We are getting closer - but still have problem attempting to use same (too technical?) language and so in understanding each other. You say > doesn't need to ever "understand" well yes. These gadgets never 'understand' anything in the human sense . The job they do us RECOGNIZE a given sound pattern- such as en English word and differentiate it from other sounds, such as a background noise and other words. > I don't want to control anything Thats OK. The idea is for the processor to 'recognize' the 'sound pattern'. Let's call that sound a word. Once it is recognized, you can use it to throw a switch - on a appliance (control) or on a counter. Processor can do the counting, and connect to a display. That's application and that's easy. What may be more or less difficult is the process of recognition: The critical parameter (which determines cost, size, complexity .. of the device or software) is number of such different words - - lets call it 'vocabulary size'. We talked about numbers 'seventeen'. .. Vocabulary size is about 120 I said, can you limit that to 'digits'? Vocabulary size is about 12 AND it was already done for voice dialing. And than you bring in the sniffles! What is the vocabulary size? How do you define what is a sniffle and what is a cough or backgrounds. Vocabulary undefined! It is possible to get an ANN software for which you do not need to define vocabulary in advance. You can 'train it' - meaning you make a sound at it will guess 'noise' ? and you enter: NO - a sniffle_type_1, eventually ANN will learn which sounds are 'in your vocabulary' and recognize them. > isn't really needed to ... understand words spoken by anyone OK. We are dealing with 'single speaker' application here . That is indeed more simple. The question still is: Is such a portable device available? I am still guessing what the task is, but likely answer is. Yes. A portable computer (laptop, PDA or wearable computer) with mike and suitable software, should be able to that, to listen, to count the 'words' in a stream and produce variety of reports and displays. I am guessing you would not do the programming yourself (not even the high level part) and so custom programming may be needed. This service, GA, can point you to resources for that. GA format is not suitable for actually doing that. It is not necessary, but it would help to know a bit more about the project goals, what is the actual application: is one of a kind device, business, hobby or just one-of-kind personal use? Are we looking for 'of the shelf' commercial product, a design specification or interest-group of people who work on these issues for fun? What are the available resources ? (cost, time, skills..) I will give your some links to reports to read which may help your to formulate technical parameters of the project: Wearable computers: http://www.redwoodhouse.com/wearable/index.html?subid=21 http://www.media.mit.edu/wearables/ ANN software: http://cslu.cse.ogi.edu/tutordemos/nnet_training/tutorial.html and more with search terms: ://www.google.com/search?hl=en&ie=ISO-8859-2&q=Neural+Network+Training+%2C+vocabulary Question remains, how complex is the task? That will determine how much hardware we need (CPU speed, memory,..) and therefore how big and expensive the final device will be. Do we need to operate in real time? Perhaps it is enough to make a recording of the stream in the field, bring the tape home, and then count the 'words' and produce the histogram? hedgie
Request for Answer Clarification by jberry-ga on 21 Sep 2003 15:55 PDT To be honest, and it is probably kind of silly, but I had the idea for this device while listening to a teenage friend of mine that was giving a talk in front of a large room full of people. Every few words she would say, "and like"; I was somewhat embarrassed for her. After thinking about it for a minute I realized that probably most of us have a bad vocal habit that we would like to eliminate, and yet, it is hard, especially when the habit is well engrained, to catch ourselves doing it. Examples: Maybe, as with the girl above, we use a word or phrase more often than we should, or, maybe someone has said we talk about ourselves too much so we want to count how many times in a 15 minute conversation we say "I", or maybe we say "uuhh" alot, or we use a cuss word or clear our throat out of nervous habit. ...So I got to thinking that a device that could let you know when it occurs would be a handy tool... All it would have to do is allow you to record the offending sound, set it to listen mode, and it would signal every occurance with a beep; making you more aware of the frequency of use which, hopefully, will help you stop doing it. (Kind of like a shock collar for a dog but a little less harsh....ha ha). I didn't think at the time, compared to real speech recognition, that building this device small enough and cheap enough for public consumption would be that difficult given today's technology. But, I have tried on many occasions since I had the idea a few years ago, to find such a device but have had no success. As I said, it is turning out not to be as simple a task as it seemed. The additional links you've provided are helpful and I appreciate all your efforts. If you have any other info to offer it will be appreciated. Otherwise I will rate the answer certainly A for effort, and answered correctly because the exact device doesn't exist.....yet! Thanks very much. I think "Google Answers" is a fantastic information resource...and you researchers are top notch. I can't wait to have another burning question!!!! (believe me it won't be long I'm sure!) Jan
Clarification of Answer by hedgie-ga on 22 Sep 2003 06:46 PDT That's an interesting idea, Jan Probably original, too. The closest I have found is http://www.halfbakery.com/idea/low-end_20speech_20recognition#1017177084 This is a 'real time' application which does not have to be a wearable device. A computer notebook would do. It would sit on the lectern and display the feedback. You do not want it to beep; it would flash an icon for 'Repetition' and other such 'sins' on the screen. The critical issue is 'who decides what is "a sin" . It is critical in determining how much such device would cost today. ____________________________ Case 1: Human intelligence is used ____________________________ For example, a) You know in advance than you friend has this nervous habit and ask her to say those few phrases into a microphone attached to the notebook, before the lecture. Then task is solved by today's "command recognition" programs. Some software customization would be needed to take the output of the recognition software and 'command the icon to flash'. It's easy to do (for a programmer). b) It is also possible that a coach in the audience could push a "repetition" or 'cut-this-out' button and the screen would display such interactive feedback. You would just need a wireless device to transmit the signal to the Notebook. A mobile (cell phone) with Blue Tooth can do that. Blue tooth, wireless devices - Systems Analysis and Design http://teachers.sdmesa.sdccd.cc.ca.us/~gmerx/CISC210-SystemsAnalysis/CIS210ClassSessions.htm http://www.geekzone.co.nz/content.asp?contentid=1364 Method b) is probably better, since human intelligence, not artificial intelligence (AI) decides what is 'annoying' to others. It might be a interesting to compare the hi-tech solution with an electronic gadget 'deciding' , with a low-tech solution, when a 'coach' , sitting in the last row signals by hand or by raising a sign such things as: slow, cut-this, louder, explain ... c) Expanding on your idea, a notebook could even show on the screen graphically how many people in the audience want 'more explanation', less explanation, louder, cut-THIS, this is BORING.... This idea would be interesting to try, and definitely do-able today. Each person would have a mobil phone (with blue tooth) and a list of codes, and transmit directly to the notebook. It would need about a week of easy programming, depending on the features it would have. There would be no AI involved. It may help the lecturer or 'freak her out'. One would have to try it. (A professor of physics I know was using interactive feedback during his lectures. Not to be told he was BORING, of course. He would ask a question and students could use infra-red clicker (like a TV remote) to select an answer. He put his system together on a shoestring using an old Linux computer. So it is possible and may be worth trying. ______________________________________________________ Case 2: Software has to recognize what is annoying nervous habit vs. what is a legitimate repetition. ______________________________________________________ This would need AI software (AI= Artificial Intelligence) For example 1) Any repetition:, it would keep statistics of all sound segments and compare and count. This is possible, and would need a BIG memory. . 2) You could compile a universal list of the usual 'fillers', a bit like "Style Checkers" offered for same word processors. These are supposed to " increase reader comprehension and document usability," Controlled English uses a defined product or service vocabulary. "This control removes ambiguity, and adds clarity, consistency and readability. " http://www.smartny.com/Default.htm I tried it and found it annoying. The AI was so dumb, it was correcting proper use of technical terms's, calling it 'jargon'. Here are few links on the limits and potential of AI: http://www.geocities.com/scimah/AI.htm http://www.cs.swarthmore.edu/~eroberts/cs91/projects/ethics-of-ai/sec9.html http://yudkowsky.net/archive-sl4/0108/0047.html So, in conclusion: Quite a few gadgets can be designed which would fit the general description given in your question, Some can be built today with a small to moderate programming effort of today's notebooks. Some would require AI, years of research and software development and would probably not work too well. (Suitable for big corporation R&D departments and PHD thesis authors :-) When designing a new gadget, we should always look to see whether a current, perhaps low-tech way exists, which can accomplish the goal. The best gadget for improving lectures today is a video-camera and a dry run. You see yourself. This is quite valuable because some annoying habits are silent. Your coach or friends tell you what you can improve and also point out what really works well. You get less nervous and slowly eliminate all the 'hard spots' in the lecture, when you tend to fall back on your pet fillers. A gadget without actual human interaction which would do all that would also need a camera, pattern recognition and lot of AI. Personally, I prefer one of the lower tech solutions that includes a friendly human being or two, as in Case 1, a or b, or the video method. Good luck on your project! hedgie

jberry-ga rated this answer: 4 out of 5 stars

Thank you.  I think the answer was good, though the point of the
device was somewhat misunderstood.  Overall it was fine and I received
some valuable information in the links provided.  thank you.

Comments

Subject: Re: specific sound recognition
From: kik-ga on 16 Sep 2003 01:22 PDT

Are you looking for something like this?

"Voice Control Systems VCS1000 is a multifunctional speaker-independent voice
recognition module for IBM pc's.  The VCS 1000 comes with specialized
vocabularies such as the "Voice Director" module which allows the recognition
of 41 control words such as On, Off, Begin, Stop, Faster, Slower, Left, Right,
Up, Down, Forward, Backward, 1,2,3,4,5,6,7,8,9 and zero). This module is
useful for factory machine control. Other modules are available and it is also
possible to connect the VCS 1000 to the telephone line for speaker independent
voice recognition over public switched telephone network. 
[Voice Control Systems, 14140 Midway Rd Ste 100, Dallas, TX 75244 214-386-5555] "

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy