Speaker-independent speech recognition?

I've got a very specific speech recognition application in mind, and I'm looking for a reference that will indicate if it's feasible. I want to recognize just one magic word, which is a very well-solved problem with high accuracy if we were talking about a boom mike and a silent environment. The difficulty is that there may be lots of other noises in the background, other people saying things, etc.

The application is something like a telematics device where you get its attention by saying "Computer...", except that the word in question can be assumed to be a unique word nobody would ever use for any other purpose. However the specifics of this application are something along the lines of:

- If the computer doesn't recognize that you want its attention, a ninja will beat you to death with a frozen muskrat, and

- If the computer hears your dog barking and thinks it was you trying to get its attention, you'll be charged $1,000 for the CPU time.

Is there an article someone can reference for me that will give some feel for the best I can expect from today's technology? Ideally, some information on the upper practical % limit to catching validly spoken words, and the lower practical limit to the number of false positives I'll see on other noises.

I see a lot of information about % recognition accuracy on the vendor websites, but they refer mostly to noise-free environments and of course to large dictionaries.

Reply to
larwe
Loading thread data ...

Maybe something here will help:

formatting link

Good luck. Richard

Reply to
Richard Seriani

Lots to read here, thanks for the pointer. Not sure this will directly give me the statistic I need, but I may be able to gather enough samples of my "magic word" being spoken in different conditions to run it through this s/w and generate some of my own stats.

Reply to
larwe

Sounds a lot like "word spotting" of the old ( cold war ) days. Lots of unencrypted voice radio transmissions in russian that were recorded. People trained to listen in, identify in all the uninteresting routine conversations key words. Then that recording that possibly contained something of value was further listened to by people who would actually understand the language. There was funding in the 70ies/80ies to do that inital step cheaper by computer. Doubtfull if anything usefull came out of it.

MfG JRD

Reply to
Rafael Deliano

It's not the same kind of application at all - really it's more like a voice-operated "clapper" switch than anything else - but the requirements are similar. The cost of a false negative or a false positive are both pretty high, though a false negative is much more costly.

I think cheap DSP technology has come a long way in the past 20-30 years :)

Reply to
larwe

I'll take the muskrat Alex. Actually, it sounds like an old Firesign Theatre line.

Scott

Reply to
Not Really Me

Never really got into Firesign Theatre. I prefer the Goon Show, Hancock's Half Hour, etc.

Anyway, I was trying to demonstrate (flippantly) the real fact that both a false hit and a false miss have real costs in this application

- a false miss is dangerous, a false hit is financially expensive.

Reply to
larwe

I suggest anti-muskrat armor and deep pockets.

Some of the fellows over on comp.dsp may have some pointers -- go ask over there, see what you find out.

--
Tim Wescott
Control systems and communications consulting
http://www.wescottdesign.com

Need to learn how to apply control theory in your embedded system?
"Applied Control Theory for Embedded Systems" by Tim Wescott
Elsevier/Newnes, http://www.wescottdesign.com/actfes/actfes.html
Reply to
Tim Wescott

Looks like Sphinx:

formatting link

Funny enough, I heard of it first in the context of a speech/VR interface for Infocom 'adventure'-type games.

--
		Przemek Klosowski, Ph.D.
Reply to
przemek klosowski

Op Thu, 08 Jan 2009 17:48:34 +0100 schreef larwe : I've got a very specific speech recognition application in mind, and I'm looking for a reference that will indicate if it's feasible. I want to recognize just one magic word, [...]

- If the computer doesn't recognize that you want its attention, a ninja will beat you to death with a frozen muskrat, and

- If the computer hears your dog barking and thinks it was you trying to get its attention, you'll be charged $1,000 for the CPU time.

Sounds like it could be a military application: firing too late is dangerous and firing for no reason is costly. Or remote assistance: screaming "help" too late is dangerous and rescueing you with a helicopter for no reason is costly.

[...]

I see a lot of information about % recognition accuracy on the vendor websites, but they refer mostly to noise-free environments and of course to large dictionaries.

I think accuracy will be a lot higher in the case of whistled languages like Silbo.

formatting link

Can your users be trained to whistle?

-- Gemaakt met Opera's revolutionaire e-mailprogramma:

formatting link

Reply to
Boudewijn Dijkstra

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.