Google Answers Logo
View Question
 
Q: specific sound recognition ( Answered 4 out of 5 stars,   1 Comment )
Question  
Subject: specific sound recognition
Category: Science > Technology
Asked by: jberry-ga
List Price: $40.00
Posted: 15 Sep 2003 17:06 PDT
Expires: 15 Oct 2003 17:06 PDT
Question ID: 257111
I would like to find out what kind of a devices are available that can
recognize or compare and match small prerecorded sound bits.  For
example: provide a recorded sample (> 1 second in duration) and the
device would then scan and listen and when it hears an exact repeat of
that sample it would signal.
If there is not such a device available what would be required to make
one.
Thank you.
jb-kaisertech

Clarification of Question by jberry-ga on 15 Sep 2003 17:07 PDT
my question should have shown less than 1 second in duration (<1
second in duration) Sorry

Request for Question Clarification by sublime1-ga on 15 Sep 2003 23:16 PDT
jb...

I think it will benefit the research if you can specify what
format of sound the device, or program, would be scanning,
once it has been given the sample. Will it be scanning
live audio in a room, or from a speaker, or will it be
scanning a file format, and, if so, which one (wav, midi,
aiff, etc). The better you can define its intended function,
the easier it will be to determine if it exists.

Request for Question Clarification by hedgie-ga on 16 Sep 2003 03:58 PDT
The phrase
               'an exact repeat of that sample'

 needs clarification.

 In 'voice recognition' applications, which the commenter
 refers to 'match' means one thing. 'Same waveform means' something else
 when  evaluating fidelity of a recording, (and is much easier). 
  The 'same sound  with different backgrounds' is still different task.

 If you explain the application, researcher may be able to guess what
 'waveform metrics' one should select to define 'close' or 'almost exact'
 and then suggest a method.
  A computer (microprocessor) with A/D and a suitable program
  can do most of such comparisons. Do you want some references on that?

Clarification of Question by jberry-ga on 17 Sep 2003 21:54 PDT
Sorry for the delay in this clarification.
I would like to find a device into which I could say a word or two
that it could thereafter recognize whenever spoken and either signal
or count.    Preferably it would be able to pick out the specific
words in normal speech.
I hope this helps.
thank you.
Jan

Request for Question Clarification by hedgie-ga on 18 Sep 2003 03:35 PDT
Thanks for clarification.

 That is a Voice Recognition System 
 as described e.g. by kik-ga in a comment.

 It may be what you want. Computer is a 'device'.
 It can be made small ( a chip) and it is easier
 if there is only one speaker. If what you have
 from kik is enough, you may close the question.
 If you want more details on 'device' please clarify further.
 (size of device, cost, number of speakers,...)

hedgie

Clarification of Question by jberry-ga on 18 Sep 2003 20:44 PDT
It doesn't really need to have a library, it has to be able to accept
a sample sound or "control words" at any time; and it wouldn't need to
have a speaker just a signaling device like a buzzer and a microphone.
 For example:  I would like to be able to say "five ten" and it would
either count or signal every time it heard that, but an hour later I
might want it to listen for "seventeen fifty". It would also be good
if it could differentiate between my voice and that of another person.
I would also like to know if the device or "chip" can be used or
configured to be used in small portable device that can be carried
around easily.  The ideal setup would be a small little device in your
pocket (like a tiny voice recorder) with a microphone attached to your
lapel.  When ever or where-ever you wanted you could press a button
and record a word or two and it would beep every time it heard that
sound until you gave it in another sound to listen for.

Thanks for your patience with me....I'm new at this.

Jan
Answer  
Subject: Re: specific sound recognition
Answered By: hedgie-ga on 20 Sep 2003 00:33 PDT
Rated:4 out of 5 stars
 
You are welcome  jberry   

 I will give you some examples of 

 Voice Command Recognition Technology
- original speaker-dependent or speaker-independent 
voice recognition algorithm,

 as offered e.g. here:
            http://www.speechpro.com/eng/technologies/recognition.html

The size of the vocabulary,  the ability understand different speakers
and to differentiate  between them, tolerance for errors, range of
speakers
(children, people with accent..) determine demands on processing power
of the CPU (processing unit) and memory of the device. That will
determine
size of the final device.       

 Here, by 'speaker' we mean 'a person who speaks' , not a loadspeaker.

Ability   to understand numbers  exists today in quite a small
package.
It used  in mobile phone  DiVo and described here

     http://www.speechpro.com/eng/products/divo.htm  

Here the vocabulary is quite small, ten digits + some controls

It looks like you are using numbers. If you can  express 

  17 50   as   " one seven break five zero end "

The task is much easier then if you need to say "seventeen" and
device can be smaller.


 The Speaker Verification  is a demanding feature.    Complexity
 depends on  error tolerance. If you can use password instead,
task is much easier.
http://www.speechpro.com/eng/technologies/restriction.html    

 Here is a free to try program one can experiment with: 
  IVOS - Intelligent Voice Operating System    
   runs on a typical home computer
          http://download.com.com/3000-7239-10070918.html

 Here is commercial device people use to control TV and lights in the
home
  http://www.smarthome.com/1470.html   


  The microprocessor (popularly known as computer on a chip)
 can be build into a small portable device and used to activate
devices, such buzzers or light indicators. There are many types,
which vary in cost from few dollars to thousands.   The power,
ability to do complex tasks,  depends on the cost. 
 http://dictionary.reference.com/search?q=microprocessor  

Here is a paper from year 1998, called THE FUTURE IS HERE! 
                            
http://www.dinf.ne.jp/doc/english/Us_Eu/conf/csun_98/csun98_013.htm

 Here is current, expensive  EADL  product  Nemo, a portable device
that accepts
 voice commands and turns them into signals that will control your
home.
         http://www.enablemart.com/products_detail.asp?id=212   

and here a list of such devices   designed for   Handicapped
Independence
      http://polio.dyndns.org/chip/mobvrec.html  

 This are few examples of this rapidly evolving technology.
 You may enter the terms below into a search engine to get more
examples.

Search Terms 
 voice command control system
microprocessor
 voice command control, portable device
  voice recognition 


  hedgie

Request for Answer Clarification by jberry-ga on 21 Sep 2003 02:03 PDT
The device I'm looking for (or maybe it is software that I'm looking
for) doesn't need to ever "understand" anything, it just needs to scan
and match short sounds.  I don't want to control anything I just want
to know how many times I make a certain sound.  Maybe I would want it
to let me know how many times an hour I clear my throat, or sniffle,
or say a certain word.

The answer given was close in that one of the links (to speechpro.com)
has a product that can spot words in a word stream.  It listens for
words or phrases like "bin laden" in a news broadcast, but the input
appears to be text input and is somewhat individual independent. 
Usually speech recognition software attempts to find ways to
understand words spoken by anyone not just one individual and it seems
that this kind of computing/software power isn't really needed at all
for this application.

The closest thing I found in the links you gave me were references to
the cell phone voice record/recognize function where you input a phone
number and then record the control command that it will recognize when
heard and dial the number.  Unfortunately it doesn't work when the
word is part of a stream.  Can you get any closer or direct me to
additional research.

...compared to the monumental task of voice recognition this seems so
easy but on closer examination I guess it isn't.

Thanks for your efforts

Jan

Thanks for any additional input or direction for more research.

Clarification of Answer by hedgie-ga on 21 Sep 2003 03:38 PDT
Jan
   
     Thanks for the clarification  it helped a bit.
 
   We are getting closer - but  still have problem attempting to use
   same (too technical?) language and so  in understanding each other.
  
 You say
 
 >                 doesn't need to ever "understand"     
 
 well yes. These gadgets never 'understand' anything in the human
sense .
  The job they do us  RECOGNIZE  a given sound pattern- such as en
English word and differentiate it from other sounds, such as a
background noise  and other words.

 
>                     I don't want to control  anything 

    Thats OK. The idea is for the processor to 'recognize' the  
'sound pattern'. Let's call that sound a word.  Once it is recognized,
you can use it to throw a switch - on a appliance (control) or on a
counter. Processor can do the counting, and connect to a display.
That's application and that's easy.
   
What may be more or less difficult is the process of recognition:

    The critical parameter (which determines cost, size, complexity ..
     of the device  or software) is  number of such different words -
    - lets call it 'vocabulary size'.   

   We talked about numbers 'seventeen'. .. Vocabulary size is about
120
   I said, can you limit that to 'digits'?   Vocabulary size is about
12 AND
   it was already done  for voice dialing. 
 And than you bring in  the sniffles!
   What is the vocabulary size? How do you define what is a sniffle
and
   what is  a cough or backgrounds. Vocabulary undefined!
 
    It is possible to get an ANN software for which you do not need to
   define vocabulary in advance.
   You can 'train it' - meaning you make a sound at it will guess
'noise' ?
   and you enter:  NO - a sniffle_type_1, 
    eventually ANN  will learn which sounds are 'in your vocabulary'
and recognize them.

>        isn't really needed   to    ... understand words spoken by
anyone

  OK.  We are dealing with 'single speaker' application here . That is
indeed more simple.

    The question still is: Is such a portable device available?

     I am still guessing what the task is, but likely answer is. Yes.
     A portable computer (laptop, PDA or wearable computer) 
      with mike and suitable software, should be able to that,
     to listen, to  count the 'words' in a stream and produce variety
of reports and displays.

    I am guessing you would not do the programming yourself (not even
the high
    level part) and so custom programming may be needed. This service,
GA, can
   point you to resources for that. GA format is not suitable for
actually doing that.
   
  It is not necessary, but it would help to know a bit more about the
project goals,
  what is the actual application: is one of a kind device,
  business, hobby or just  one-of-kind personal use?  Are we looking
for
  'of the shelf' commercial product, a design specification or
interest-group of people
 who work on these issues for fun? What are the available resources ?
(cost, time, skills..)

 I will give your some links to reports to read which may help  your
to formulate
 technical parameters of the project:
  
Wearable computers:
        http://www.redwoodhouse.com/wearable/index.html?subid=21 
         http://www.media.mit.edu/wearables/    

 ANN software:  
                  http://cslu.cse.ogi.edu/tutordemos/nnet_training/tutorial.html
and more with search terms:
 ://www.google.com/search?hl=en&ie=ISO-8859-2&q=Neural+Network+Training+%2C+vocabulary

  Question remains, how complex is the task? That will determine how
much hardware we need
 (CPU speed, memory,..) and therefore how big and expensive the final
device will be.
Do we need to operate in real time? Perhaps it is enough to make a
recording of the stream
in the field, bring the tape home, and then count the 'words' and
produce the histogram?

hedgie

Request for Answer Clarification by jberry-ga on 21 Sep 2003 15:55 PDT
To be honest, and it is probably kind of silly, but I had the idea for
this device while listening to a teenage friend of mine that was
giving a talk in front of a large room full of people.  Every few
words she would say, "and like"; I was somewhat embarrassed for her.

After thinking about it for a minute I realized that probably most of
us have a bad vocal habit that we would like to eliminate, and yet, it
is hard, especially when the habit is well engrained, to catch
ourselves doing it.

Examples: Maybe, as with the girl above, we use a word or phrase more
often than we should, or, maybe someone has said we talk about
ourselves too much so we want to count how many times in a 15 minute
conversation we say "I", or maybe we say "uuhh" alot, or we use a cuss
word or clear our throat out of nervous habit.

...So I got to thinking that a device that could let you know when it
occurs would be a handy tool... All it would have to do is allow you
to record the offending sound, set it to listen mode, and it would
signal every occurance with a beep; making you more aware of the
frequency of use which, hopefully, will help you stop doing it.  (Kind
of like a shock collar for a dog but a little less harsh....ha ha).

I didn't think at the time, compared to real speech recognition, that
building this device small enough and cheap enough for public
consumption would be that difficult given today's technology.  But, I
have tried on many occasions since I had the idea a few years ago, to
find such a device but have had no success.  As I said, it is turning
out not to be as simple a task as it seemed.

The additional links you've provided are helpful and I appreciate all
your efforts.  If you have any other info to offer it will be
appreciated.  Otherwise I will rate the answer certainly A for effort,
and answered correctly because the exact device doesn't exist.....yet!

Thanks very much.  I think "Google Answers" is a fantastic information
resource...and you researchers are top notch.  I can't wait to have
another burning question!!!! (believe me it won't be long I'm sure!)

Jan

Clarification of Answer by hedgie-ga on 22 Sep 2003 06:46 PDT
That's an interesting idea, Jan 
   Probably original, too. The closest I have found is  

       http://www.halfbakery.com/idea/low-end_20speech_20recognition#1017177084
    
 This is a  'real time' application  which does not have to be a
wearable device.
 A computer notebook would do.  It would sit on the lectern and
display the
   feedback. You do not want it  to beep; it would flash an icon for
'Repetition'
   and other such 'sins' on the screen.     

   The critical issue is 'who decides what is "a sin" . 
   It is critical in determining how much such device would cost
today.
____________________________
Case 1: Human intelligence is used 
____________________________
            For example, 

 a)  You know in advance than you friend has this nervous
 habit and ask her to say those few phrases into a microphone attached
to the notebook,
 before the lecture. Then task is  solved by today's "command
recognition" programs.
 Some software customization  would be needed to take the output of
the recognition
 software and 'command the icon to flash'.   It's easy to do (for a
programmer).

b)   It is also possible that a coach  in the audience  could push a
"repetition"
      or 'cut-this-out' button and the screen would display such
interactive  feedback.
                                
      You  would just need a wireless device to transmit the signal to
the Notebook.
    A mobile (cell phone) with Blue Tooth can do that. 
     Blue tooth, wireless devices  -     Systems Analysis and Design
         http://teachers.sdmesa.sdccd.cc.ca.us/~gmerx/CISC210-SystemsAnalysis/CIS210ClassSessions.htm
        http://www.geekzone.co.nz/content.asp?contentid=1364

  Method b) is probably better, since human intelligence, not
artificial intelligence (AI) decides what is
  'annoying' to others.  
 
   It might be a interesting to  compare the  hi-tech  solution with
an electronic gadget  'deciding' ,
   with a low-tech solution, when a 'coach' , sitting in the last row
signals by hand or by raising a
  sign such things as:   slow, cut-this, louder,  explain ...     

 c) 	Expanding on your idea,  a notebook could even show on the screen
graphically how many
    people in the audience want 'more explanation', less explanation,
louder, cut-THIS,  this is BORING....
   This idea would be interesting to try, and definitely do-able
today. Each person would
    have a mobil phone (with blue tooth)  and  a list of codes, and
transmit directly to the notebook.
   It would  need about a  week of easy programming, depending on the
features it would have.
   There would be no AI involved. It may help the lecturer or 'freak
her out'. One would have to try it.

       (A professor of physics I know was using interactive feedback
during his
       lectures. Not to be told he was BORING, of course. He would ask
a question and students
       could use infra-red clicker (like a TV remote) to select an
answer. He put his system together
       on a shoestring using an old Linux computer.   So it is
possible and  may be worth trying.

______________________________________________________            
 Case 2:  Software has to recognize what is annoying nervous habit 
               vs. what is a  legitimate repetition.  
______________________________________________________
 
        This would need AI software (AI= Artificial Intelligence) 

        For example 

 1) Any repetition:, it would keep statistics of all sound segments
and compare
     and count.   This is possible, and  would need a BIG memory. .

 2) You  could compile a  universal list of the usual 'fillers', a bit
like
     "Style Checkers" offered for same word processors.

     These are supposed to " increase reader comprehension and
document usability,"
     Controlled English uses a defined product or service vocabulary.
     "This control removes ambiguity, and adds clarity, consistency
and  readability. "
        http://www.smartny.com/Default.htm   

     I tried it and found it annoying. The AI was so dumb, it  was
correcting proper use
     of technical terms's, calling it  'jargon'.
     
     Here are few links on the limits and potential of AI:  
                      http://www.geocities.com/scimah/AI.htm  
                       
http://www.cs.swarthmore.edu/~eroberts/cs91/projects/ethics-of-ai/sec9.html
                       http://yudkowsky.net/archive-sl4/0108/0047.html

   So, in conclusion: Quite a few gadgets can be designed which would
fit the general
   description given in your question, Some can be built today with a
small to moderate
  programming effort  of today's notebooks.  Some would require AI,
years of research
  and software development  and would probably not work too well.
(Suitable for
  big corporation R&D departments and PHD thesis authors  :-)
              
   When designing a new gadget, we should always look to see whether a
current,
   perhaps low-tech way  exists, which can accomplish the goal.

   The best gadget for improving lectures today is a video-camera and
a dry run.
  You see yourself.  This is quite valuable because some annoying
habits are silent.
  Your coach or friends  tell you what you  can improve and also point
out what really works well.
  You get less nervous and slowly eliminate all the 'hard spots' in
the lecture, when you
   tend to fall back on your pet fillers.

   A gadget without actual human interaction which would do all that
would also need a camera,
   pattern recognition and lot of AI.  Personally, I prefer one of the
lower tech solutions that
   includes a friendly human being or two, as in Case 1, a or b, or 
the video method.

   Good luck on your project!
     
   hedgie
jberry-ga rated this answer:4 out of 5 stars
Thank you.  I think the answer was good, though the point of the
device was somewhat misunderstood.  Overall it was fine and I received
some valuable information in the links provided.  thank you.

Comments  
Subject: Re: specific sound recognition
From: kik-ga on 16 Sep 2003 01:22 PDT
 
Are you looking for something like this?

"Voice Control Systems VCS1000 is a multifunctional speaker-independent voice
recognition module for IBM pc's.  The VCS 1000 comes with specialized
vocabularies such as the "Voice Director" module which allows the recognition
of 41 control words such as On, Off, Begin, Stop, Faster, Slower, Left, Right,
Up, Down, Forward, Backward, 1,2,3,4,5,6,7,8,9 and zero). This module is
useful for factory machine control. Other modules are available and it is also
possible to connect the VCS 1000 to the telephone line for speaker independent
voice recognition over public switched telephone network. 
[Voice Control Systems, 14140 Midway Rd Ste 100, Dallas, TX 75244 214-386-5555] "

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy