I would like to find out what kind of a devices are available that can
recognize or compare and match small prerecorded sound bits. For
example: provide a recorded sample (> 1 second in duration) and the
device would then scan and listen and when it hears an exact repeat of
that sample it would signal.
If there is not such a device available what would be required to make
one.
Thank you.
jb-kaisertech |
Clarification of Question by
jberry-ga
on
15 Sep 2003 17:07 PDT
my question should have shown less than 1 second in duration (<1
second in duration) Sorry
|
Request for Question Clarification by
sublime1-ga
on
15 Sep 2003 23:16 PDT
jb...
I think it will benefit the research if you can specify what
format of sound the device, or program, would be scanning,
once it has been given the sample. Will it be scanning
live audio in a room, or from a speaker, or will it be
scanning a file format, and, if so, which one (wav, midi,
aiff, etc). The better you can define its intended function,
the easier it will be to determine if it exists.
|
Request for Question Clarification by
hedgie-ga
on
16 Sep 2003 03:58 PDT
The phrase
'an exact repeat of that sample'
needs clarification.
In 'voice recognition' applications, which the commenter
refers to 'match' means one thing. 'Same waveform means' something else
when evaluating fidelity of a recording, (and is much easier).
The 'same sound with different backgrounds' is still different task.
If you explain the application, researcher may be able to guess what
'waveform metrics' one should select to define 'close' or 'almost exact'
and then suggest a method.
A computer (microprocessor) with A/D and a suitable program
can do most of such comparisons. Do you want some references on that?
|
Clarification of Question by
jberry-ga
on
17 Sep 2003 21:54 PDT
Sorry for the delay in this clarification.
I would like to find a device into which I could say a word or two
that it could thereafter recognize whenever spoken and either signal
or count. Preferably it would be able to pick out the specific
words in normal speech.
I hope this helps.
thank you.
Jan
|
Request for Question Clarification by
hedgie-ga
on
18 Sep 2003 03:35 PDT
Thanks for clarification.
That is a Voice Recognition System
as described e.g. by kik-ga in a comment.
It may be what you want. Computer is a 'device'.
It can be made small ( a chip) and it is easier
if there is only one speaker. If what you have
from kik is enough, you may close the question.
If you want more details on 'device' please clarify further.
(size of device, cost, number of speakers,...)
hedgie
|
Clarification of Question by
jberry-ga
on
18 Sep 2003 20:44 PDT
It doesn't really need to have a library, it has to be able to accept
a sample sound or "control words" at any time; and it wouldn't need to
have a speaker just a signaling device like a buzzer and a microphone.
For example: I would like to be able to say "five ten" and it would
either count or signal every time it heard that, but an hour later I
might want it to listen for "seventeen fifty". It would also be good
if it could differentiate between my voice and that of another person.
I would also like to know if the device or "chip" can be used or
configured to be used in small portable device that can be carried
around easily. The ideal setup would be a small little device in your
pocket (like a tiny voice recorder) with a microphone attached to your
lapel. When ever or where-ever you wanted you could press a button
and record a word or two and it would beep every time it heard that
sound until you gave it in another sound to listen for.
Thanks for your patience with me....I'm new at this.
Jan
|
You are welcome jberry
I will give you some examples of
Voice Command Recognition Technology
- original speaker-dependent or speaker-independent
voice recognition algorithm,
as offered e.g. here:
http://www.speechpro.com/eng/technologies/recognition.html
The size of the vocabulary, the ability understand different speakers
and to differentiate between them, tolerance for errors, range of
speakers
(children, people with accent..) determine demands on processing power
of the CPU (processing unit) and memory of the device. That will
determine
size of the final device.
Here, by 'speaker' we mean 'a person who speaks' , not a loadspeaker.
Ability to understand numbers exists today in quite a small
package.
It used in mobile phone DiVo and described here
http://www.speechpro.com/eng/products/divo.htm
Here the vocabulary is quite small, ten digits + some controls
It looks like you are using numbers. If you can express
17 50 as " one seven break five zero end "
The task is much easier then if you need to say "seventeen" and
device can be smaller.
The Speaker Verification is a demanding feature. Complexity
depends on error tolerance. If you can use password instead,
task is much easier.
http://www.speechpro.com/eng/technologies/restriction.html
Here is a free to try program one can experiment with:
IVOS - Intelligent Voice Operating System
runs on a typical home computer
http://download.com.com/3000-7239-10070918.html
Here is commercial device people use to control TV and lights in the
home
http://www.smarthome.com/1470.html
The microprocessor (popularly known as computer on a chip)
can be build into a small portable device and used to activate
devices, such buzzers or light indicators. There are many types,
which vary in cost from few dollars to thousands. The power,
ability to do complex tasks, depends on the cost.
http://dictionary.reference.com/search?q=microprocessor
Here is a paper from year 1998, called THE FUTURE IS HERE!
http://www.dinf.ne.jp/doc/english/Us_Eu/conf/csun_98/csun98_013.htm
Here is current, expensive EADL product Nemo, a portable device
that accepts
voice commands and turns them into signals that will control your
home.
http://www.enablemart.com/products_detail.asp?id=212
and here a list of such devices designed for Handicapped
Independence
http://polio.dyndns.org/chip/mobvrec.html
This are few examples of this rapidly evolving technology.
You may enter the terms below into a search engine to get more
examples.
Search Terms
voice command control system
microprocessor
voice command control, portable device
voice recognition
hedgie |
Request for Answer Clarification by
jberry-ga
on
21 Sep 2003 02:03 PDT
The device I'm looking for (or maybe it is software that I'm looking
for) doesn't need to ever "understand" anything, it just needs to scan
and match short sounds. I don't want to control anything I just want
to know how many times I make a certain sound. Maybe I would want it
to let me know how many times an hour I clear my throat, or sniffle,
or say a certain word.
The answer given was close in that one of the links (to speechpro.com)
has a product that can spot words in a word stream. It listens for
words or phrases like "bin laden" in a news broadcast, but the input
appears to be text input and is somewhat individual independent.
Usually speech recognition software attempts to find ways to
understand words spoken by anyone not just one individual and it seems
that this kind of computing/software power isn't really needed at all
for this application.
The closest thing I found in the links you gave me were references to
the cell phone voice record/recognize function where you input a phone
number and then record the control command that it will recognize when
heard and dial the number. Unfortunately it doesn't work when the
word is part of a stream. Can you get any closer or direct me to
additional research.
...compared to the monumental task of voice recognition this seems so
easy but on closer examination I guess it isn't.
Thanks for your efforts
Jan
Thanks for any additional input or direction for more research.
|
Clarification of Answer by
hedgie-ga
on
21 Sep 2003 03:38 PDT
Jan
Thanks for the clarification it helped a bit.
We are getting closer - but still have problem attempting to use
same (too technical?) language and so in understanding each other.
You say
> doesn't need to ever "understand"
well yes. These gadgets never 'understand' anything in the human
sense .
The job they do us RECOGNIZE a given sound pattern- such as en
English word and differentiate it from other sounds, such as a
background noise and other words.
> I don't want to control anything
Thats OK. The idea is for the processor to 'recognize' the
'sound pattern'. Let's call that sound a word. Once it is recognized,
you can use it to throw a switch - on a appliance (control) or on a
counter. Processor can do the counting, and connect to a display.
That's application and that's easy.
What may be more or less difficult is the process of recognition:
The critical parameter (which determines cost, size, complexity ..
of the device or software) is number of such different words -
- lets call it 'vocabulary size'.
We talked about numbers 'seventeen'. .. Vocabulary size is about
120
I said, can you limit that to 'digits'? Vocabulary size is about
12 AND
it was already done for voice dialing.
And than you bring in the sniffles!
What is the vocabulary size? How do you define what is a sniffle
and
what is a cough or backgrounds. Vocabulary undefined!
It is possible to get an ANN software for which you do not need to
define vocabulary in advance.
You can 'train it' - meaning you make a sound at it will guess
'noise' ?
and you enter: NO - a sniffle_type_1,
eventually ANN will learn which sounds are 'in your vocabulary'
and recognize them.
> isn't really needed to ... understand words spoken by
anyone
OK. We are dealing with 'single speaker' application here . That is
indeed more simple.
The question still is: Is such a portable device available?
I am still guessing what the task is, but likely answer is. Yes.
A portable computer (laptop, PDA or wearable computer)
with mike and suitable software, should be able to that,
to listen, to count the 'words' in a stream and produce variety
of reports and displays.
I am guessing you would not do the programming yourself (not even
the high
level part) and so custom programming may be needed. This service,
GA, can
point you to resources for that. GA format is not suitable for
actually doing that.
It is not necessary, but it would help to know a bit more about the
project goals,
what is the actual application: is one of a kind device,
business, hobby or just one-of-kind personal use? Are we looking
for
'of the shelf' commercial product, a design specification or
interest-group of people
who work on these issues for fun? What are the available resources ?
(cost, time, skills..)
I will give your some links to reports to read which may help your
to formulate
technical parameters of the project:
Wearable computers:
http://www.redwoodhouse.com/wearable/index.html?subid=21
http://www.media.mit.edu/wearables/
ANN software:
http://cslu.cse.ogi.edu/tutordemos/nnet_training/tutorial.html
and more with search terms:
://www.google.com/search?hl=en&ie=ISO-8859-2&q=Neural+Network+Training+%2C+vocabulary
Question remains, how complex is the task? That will determine how
much hardware we need
(CPU speed, memory,..) and therefore how big and expensive the final
device will be.
Do we need to operate in real time? Perhaps it is enough to make a
recording of the stream
in the field, bring the tape home, and then count the 'words' and
produce the histogram?
hedgie
|
Request for Answer Clarification by
jberry-ga
on
21 Sep 2003 15:55 PDT
To be honest, and it is probably kind of silly, but I had the idea for
this device while listening to a teenage friend of mine that was
giving a talk in front of a large room full of people. Every few
words she would say, "and like"; I was somewhat embarrassed for her.
After thinking about it for a minute I realized that probably most of
us have a bad vocal habit that we would like to eliminate, and yet, it
is hard, especially when the habit is well engrained, to catch
ourselves doing it.
Examples: Maybe, as with the girl above, we use a word or phrase more
often than we should, or, maybe someone has said we talk about
ourselves too much so we want to count how many times in a 15 minute
conversation we say "I", or maybe we say "uuhh" alot, or we use a cuss
word or clear our throat out of nervous habit.
...So I got to thinking that a device that could let you know when it
occurs would be a handy tool... All it would have to do is allow you
to record the offending sound, set it to listen mode, and it would
signal every occurance with a beep; making you more aware of the
frequency of use which, hopefully, will help you stop doing it. (Kind
of like a shock collar for a dog but a little less harsh....ha ha).
I didn't think at the time, compared to real speech recognition, that
building this device small enough and cheap enough for public
consumption would be that difficult given today's technology. But, I
have tried on many occasions since I had the idea a few years ago, to
find such a device but have had no success. As I said, it is turning
out not to be as simple a task as it seemed.
The additional links you've provided are helpful and I appreciate all
your efforts. If you have any other info to offer it will be
appreciated. Otherwise I will rate the answer certainly A for effort,
and answered correctly because the exact device doesn't exist.....yet!
Thanks very much. I think "Google Answers" is a fantastic information
resource...and you researchers are top notch. I can't wait to have
another burning question!!!! (believe me it won't be long I'm sure!)
Jan
|
Clarification of Answer by
hedgie-ga
on
22 Sep 2003 06:46 PDT
That's an interesting idea, Jan
Probably original, too. The closest I have found is
http://www.halfbakery.com/idea/low-end_20speech_20recognition#1017177084
This is a 'real time' application which does not have to be a
wearable device.
A computer notebook would do. It would sit on the lectern and
display the
feedback. You do not want it to beep; it would flash an icon for
'Repetition'
and other such 'sins' on the screen.
The critical issue is 'who decides what is "a sin" .
It is critical in determining how much such device would cost
today.
____________________________
Case 1: Human intelligence is used
____________________________
For example,
a) You know in advance than you friend has this nervous
habit and ask her to say those few phrases into a microphone attached
to the notebook,
before the lecture. Then task is solved by today's "command
recognition" programs.
Some software customization would be needed to take the output of
the recognition
software and 'command the icon to flash'. It's easy to do (for a
programmer).
b) It is also possible that a coach in the audience could push a
"repetition"
or 'cut-this-out' button and the screen would display such
interactive feedback.
You would just need a wireless device to transmit the signal to
the Notebook.
A mobile (cell phone) with Blue Tooth can do that.
Blue tooth, wireless devices - Systems Analysis and Design
http://teachers.sdmesa.sdccd.cc.ca.us/~gmerx/CISC210-SystemsAnalysis/CIS210ClassSessions.htm
http://www.geekzone.co.nz/content.asp?contentid=1364
Method b) is probably better, since human intelligence, not
artificial intelligence (AI) decides what is
'annoying' to others.
It might be a interesting to compare the hi-tech solution with
an electronic gadget 'deciding' ,
with a low-tech solution, when a 'coach' , sitting in the last row
signals by hand or by raising a
sign such things as: slow, cut-this, louder, explain ...
c) Expanding on your idea, a notebook could even show on the screen
graphically how many
people in the audience want 'more explanation', less explanation,
louder, cut-THIS, this is BORING....
This idea would be interesting to try, and definitely do-able
today. Each person would
have a mobil phone (with blue tooth) and a list of codes, and
transmit directly to the notebook.
It would need about a week of easy programming, depending on the
features it would have.
There would be no AI involved. It may help the lecturer or 'freak
her out'. One would have to try it.
(A professor of physics I know was using interactive feedback
during his
lectures. Not to be told he was BORING, of course. He would ask
a question and students
could use infra-red clicker (like a TV remote) to select an
answer. He put his system together
on a shoestring using an old Linux computer. So it is
possible and may be worth trying.
______________________________________________________
Case 2: Software has to recognize what is annoying nervous habit
vs. what is a legitimate repetition.
______________________________________________________
This would need AI software (AI= Artificial Intelligence)
For example
1) Any repetition:, it would keep statistics of all sound segments
and compare
and count. This is possible, and would need a BIG memory. .
2) You could compile a universal list of the usual 'fillers', a bit
like
"Style Checkers" offered for same word processors.
These are supposed to " increase reader comprehension and
document usability,"
Controlled English uses a defined product or service vocabulary.
"This control removes ambiguity, and adds clarity, consistency
and readability. "
http://www.smartny.com/Default.htm
I tried it and found it annoying. The AI was so dumb, it was
correcting proper use
of technical terms's, calling it 'jargon'.
Here are few links on the limits and potential of AI:
http://www.geocities.com/scimah/AI.htm
http://www.cs.swarthmore.edu/~eroberts/cs91/projects/ethics-of-ai/sec9.html
http://yudkowsky.net/archive-sl4/0108/0047.html
So, in conclusion: Quite a few gadgets can be designed which would
fit the general
description given in your question, Some can be built today with a
small to moderate
programming effort of today's notebooks. Some would require AI,
years of research
and software development and would probably not work too well.
(Suitable for
big corporation R&D departments and PHD thesis authors :-)
When designing a new gadget, we should always look to see whether a
current,
perhaps low-tech way exists, which can accomplish the goal.
The best gadget for improving lectures today is a video-camera and
a dry run.
You see yourself. This is quite valuable because some annoying
habits are silent.
Your coach or friends tell you what you can improve and also point
out what really works well.
You get less nervous and slowly eliminate all the 'hard spots' in
the lecture, when you
tend to fall back on your pet fillers.
A gadget without actual human interaction which would do all that
would also need a camera,
pattern recognition and lot of AI. Personally, I prefer one of the
lower tech solutions that
includes a friendly human being or two, as in Case 1, a or b, or
the video method.
Good luck on your project!
hedgie
|
Are you looking for something like this?
"Voice Control Systems VCS1000 is a multifunctional speaker-independent voice
recognition module for IBM pc's. The VCS 1000 comes with specialized
vocabularies such as the "Voice Director" module which allows the recognition
of 41 control words such as On, Off, Begin, Stop, Faster, Slower, Left, Right,
Up, Down, Forward, Backward, 1,2,3,4,5,6,7,8,9 and zero). This module is
useful for factory machine control. Other modules are available and it is also
possible to connect the VCS 1000 to the telephone line for speaker independent
voice recognition over public switched telephone network.
[Voice Control Systems, 14140 Midway Rd Ste 100, Dallas, TX 75244 214-386-5555] " |