Hi,
Sorry for the poor choice of subject line. :(
I have a few different speech synthesizers that I've been refining for a product. All are intended to be *very* lightweight (minimal run-time resources). Voice quality, pronunciation, etc. isn't essential (but don't want to deliberately hamper performance).
As I can operate them in semi-limited domains (since I *tend* to be the source of the text they utter), I've opted to adopt some of the classic approaches to the text-to-phoneme portion of the algorithms instead of trying to begin a study in linguistics, etc. (even bloated synthesizers have problems with speech so why waste effort trying to *approach* their performance levels with 1% of *their* resources?!)
Most of the work I'm adopting is decades old. Current trends rely on having *lots* of resources available (big dictionaries, MIPS, etc.) so they've all gone off in a different direction. So, contacting original authors is a dubious proposition ("Hey, do you remember that work you did 30 years ago? I've got some niggly little detail that I need help resolving. Off the top of your head...")
[I've had *some* success -- thx DM!]Many of the documents are N-th generation photocopies, fiche, etc. So, lots of artifacts in them ("Is that a speck of lint or a backslash?") But, I've been able to resolve many of the "unintended additions" with a bit of careful examination of the details involved. The worst cases are the long lists of (hundreds of) rules -- any of which might be corrupted by a speck of paper lint, a crease in the original when it was photocopied, etc.
But, there are some things that simply can't be attributed to copying errors. I.e., cases where glyphs are obviously *missing*. And, others where something is present in a legend -- yet never occurs elsewhere! Still other ambiguities exist (Is this instance of "YL" to be interpreted as the legend symbol "YL"? Or, as the legend symbol "Y" followed by the legend symbol "L"? And, what is the *effective* difference??)
[This sort of crap happens when people aren't careful preparing docs. And, when they don't (or can't?) "cut and paste" from the ACTUAL SOURCE CODE into the final documentation but, instead, try to transcribe things manually: "Is that a lowercase L or a digit 1?"]I *think* the only way I can *hope* (no guarantee) to resolve these sorts of things is to throw lots of data at it and hope to see a pattern in the failure(s) that result. Perhaps even instrumenting my code so that I can flag each datum that tickles a "suspicious rule". Then, hope I can fathom what they have in common and how to resolve the error.
This is complicated by the fact that the algorithms aren't "perfect" to begin with. So, the idea of comparing computed pronunciations against a *dictionary* of pronunciations would be ineffective as it would flag all of the "semi-acceptible" pronunciations as "errors". I don't have ready access to the original data from which the rules were derived (nor the "private notes" by which they decided to trade off performance of one rule vs. another in certain instances).
[It's actually fascinating to look at word spellings in detail and the big differences in their pronunciations! E.g., water/pater/later; valentine/aborigine/clandestine; etc.]Does this approach seem to make sense? I.e., tag each input that tickles a suspicious rule and try to resolve the problems by "staring at them"? Any other suggestions that might be more productive? Esp given that we each view words as having specific pronunciations and, without religiously consulting a "reference", can easily dismiss what *appears* to be a problem as a NON problem (e.g., most folks seem to mispronounce "salmon" so wouldn't notice if the algorithm ALSO mispronounced it!)
[N.B. when I refer to examining the "flagged output", I don't mean *audio* output but, rather, phonemic transcriptions of the input]Thx!
PS: I didn't bother withthe *.speech.* groups as they all appear to be moribund