OT(but related):Alexa

Hello, all. My question is about where Amazon's "Alexa" voice responses
are generated. Are they generated by a synthesizer on the local device
(e.g. an Echo or Dot) or do they originate at the Amazon central server
and then are sent/streamed to the local device to be played like a music
file (e.g. mp4 or aac)? Thanks for your time and comment. Sincerely,
--
J. B. Wood	            e-mail: arl_123234@hotmail.com
Reply to
J.B. Wood
Loading thread data ...
They appear to be streamed, as it's possible for 'skill' developers to upload custom audio to Amazon's server to be streamed out along with the speech.
formatting link

Owain
Reply to
spuorgelgoog
Don't know about alexa, but docs for the google AIY kit indicates that the pi/hat generates text and receives text in return. When there are easily available text to speech apps available, it would seem very inefficient and bandwidth intensive to stream.
Reply to
ray carter
Speech synthesis, as opposed to speech recognition, is not very processor intensive. The synthetic ones require the most computation, and the natural voice ones are more dependent on memory bandwidth and storage latency.
We were able to run many synths on 200MHz iPaq StrongARM PDAs and had all the big name synthetic and natural voice synthesisers running on easily on early 400MHz XScale Windows CE mobile phones. Even a Pi1 wouldn't have problems running those, and I suspect should also be able to cope with the latest versions.
Although nowhere near in the same quality league, the 8 bit 2 MHz BBC Micro was able to run the Superior Software synthetic speech synthesiser, by modulating the 4bit volume of the sound chip at about 8KHz.
---druck
Reply to
druck
On Sat, 17 Mar 2018 20:12:48 +0000, druck declaimed the following:
The 8MHz Commodore Amiga used to have translator (converted normal text to encoded phonemes [numeric intonation data and conversion of "c" to "s" or "k" as appropriate) and narrator device. However, I think they lost the distribution license for the libraries by the time of AmigaOS 3. The synthesizer device had the ability to return height and width data to a running program, intended, I'm sure, to allow the program to animate a mouth to match the syllables.
--
	Wulfraed                 Dennis Lee Bieber         AF6VN 
	wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/
Reply to
Dennis Lee Bieber
To be fair, IIUC, at least the early Amigas had some special-purpose hardware assistance (the Copper and a couple of other chips) rather than doing _everything_ in software. My older daughter was a little non-plussed that it pronounced her name Jen-knife-er when doing from plain text to speech.
Interestingly, I was thinking just this morning about the early Amigas' ability to synthesize music (four-voice, IIRC) mixed with pre-recorded vocals off a ~1.8-2.0MB floppy.
--
Robert Riches 
spamtrap42@jacob21819.net 
 Click to see the full signature
Reply to
Robert Riches
Well, since we?re exploring the performance threshold for software speech synthesis, the 1MHz Apple II ran SAM (Software Automatic Mouth), which in its first incarnation drove a speaker from an 8-bit DAC, but in later versions was fully software, using ultrasonic PWM to output quite acceptable speech through the 1-bit built-in speaker interface.
Virtually all vintage text-to-speech synthesizers relied upon a text-to-phoneme algorithm created by the Naval Postgraduate School in Monterey, California. It worked reasonably well, allowed for pitch and other ?hints?, and was public domain.
As has been noted, synthesis is relatively easy; analysis is hard. ;-)
--
-michael - NadaNet 3.1 and AppleCrate II:  http://michaeljmahon.com
Reply to
Michael J. Mahon
On a sunny day (Sat, 17 Mar 2018 22:04:18 -0400) it happened Dennis Lee Bieber wrote in :
I am running a Pi1B (I think it is, very old, 2 x USB out) and the 'festival' speech synthesizer:
formatting link
to speak alarm messages for a navigation system. The pi also does a whole lot of other stuff at the same time:
formatting link

Have not got a pi 3 yet, maybe waiting for the 3.141593 as it will be more accurate.
Reply to
Jan Panteltje
I remember seeing a speech synthesis package on a 6800 system running FLEX2. The audio card was relatively simple and installed on its SS-30 peripheral bus. This board could be driven from a BASIC program. Details of the hardware are hazy now, but I remember that you had to misspell some words for them to be pronounced correctly.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
On 18 Mar 2018 03:14:41 GMT, Robert Riches declaimed the following:
Mostly for the graphics -- the blitter could handle moving sprites and overlaying windows... And DMA operations. I believe the main CPU still had to set up the sound forms.
I'd rounded to 8MHz but in reality, the Amiga base was closer to 7.2MHz -- but the chips handling the display allowed it to achieve better performance than the 8MHz Mac of the time (since the Mac processor had to handle the display scan, user code only got dedicated time during retrace intervals).
--
	Wulfraed                 Dennis Lee Bieber         AF6VN 
	wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/
Reply to
Dennis Lee Bieber
Gosh, that'll be TSC eXtended BASIC, presumably?
Dave
Reply to
Dave Higton
7.16 MHz, to be exact. This was twice the 3.58MHz color burst subcarrier frequency used in NTSC video, which gave the Amiga a leg up in video processing. Amigas were often used in local cable stations for things like public service announcements, and the Video Toaster enabled the Amiga to create all sorts of video effects which were quite impressive, especially for the time. Todd Rundgren's music video "Change Myself" was created with a bank of ten Amigas.
The quirks in the Amiga's voice synthesis system were quite amusing, especially here in the city of Van-cow-ver.
--
/~\  cgibbs@kltpzyxm.invalid (Charlie Gibbs) 
\ /  I'm really at ac.dekanfrus if you read it the right way. 
 Click to see the full signature
Reply to
Charlie Gibbs
Can't remember. It wasn't my system.
I only had FLEX09 and whatever the standard BASIC was, but I didn't use that much. Almost everything I did under FLEX09 was written with the excellent Windrush MACE assembler and PL/9 compiler. However I might have used the TSC assembler when I wrote my 4KB monitor, first for a 24x80 screen and then when I swapped floppy disk controllers.
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie
I should not forget to mention Jonathon Duddington's espeak, developed on RISC OS & ARM back in 1995, and available on all popular Linux distros.
---druck
Reply to
druck

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.