I've got a very specific speech recognition application in mind, and I'm looking for a reference that will indicate if it's feasible. I want to recognize just one magic word, which is a very well-solved problem with high accuracy if we were talking about a boom mike and a silent environment. The difficulty is that there may be lots of other noises in the background, other people saying things, etc.
The application is something like a telematics device where you get its attention by saying "Computer...", except that the word in question can be assumed to be a unique word nobody would ever use for any other purpose. However the specifics of this application are something along the lines of:
- If the computer doesn't recognize that you want its attention, a ninja will beat you to death with a frozen muskrat, and
- If the computer hears your dog barking and thinks it was you trying to get its attention, you'll be charged $1,000 for the CPU time.
Is there an article someone can reference for me that will give some feel for the best I can expect from today's technology? Ideally, some information on the upper practical % limit to catching validly spoken words, and the lower practical limit to the number of false positives I'll see on other noises.
I see a lot of information about % recognition accuracy on the vendor websites, but they refer mostly to noise-free environments and of course to large dictionaries.