OT(but related):Alexa

J

J.B. Wood 8 years ago

Hello, all. My question is about where Amazon's "Alexa" voice responses are generated. Are they generated by a synthesizer on the local device (e.g. an Echo or Dot) or do they originate at the Amazon central server and then are sent/streamed to the local device to be played like a music file (e.g. mp4 or aac)? Thanks for your time and comment. Sincerely,

J. B. Wood e-mail: arl_123234@hotmail.com

Vote

S

spuorgelgoog 8 years ago

They appear to be streamed, as it's possible for 'skill' developers to upload custom audio to Amazon's server to be streamed out along with the speech.

formatting link

Owain

Vote

R

ray carter 8 years ago

Don't know about alexa, but docs for the google AIY kit indicates that the pi/hat generates text and receives text in return. When there are easily available text to speech apps available, it would seem very inefficient and bandwidth intensive to stream.

Vote

D

druck 8 years ago

Speech synthesis, as opposed to speech recognition, is not very processor intensive. The synthetic ones require the most computation, and the natural voice ones are more dependent on memory bandwidth and storage latency.

We were able to run many synths on 200MHz iPaq StrongARM PDAs and had all the big name synthetic and natural voice synthesisers running on easily on early 400MHz XScale Windows CE mobile phones. Even a Pi1 wouldn't have problems running those, and I suspect should also be able to cope with the latest versions.

Although nowhere near in the same quality league, the 8 bit 2 MHz BBC Micro was able to run the Superior Software synthetic speech synthesiser, by modulating the 4bit volume of the sound chip at about 8KHz.

---druck

Vote

D

Dennis Lee Bieber 8 years ago

On Sat, 17 Mar 2018 20:12:48 +0000, druck declaimed the following:

The 8MHz Commodore Amiga used to have translator (converted normal text to encoded phonemes [numeric intonation data and conversion of "c" to "s" or "k" as appropriate) and narrator device. However, I think they lost the distribution license for the libraries by the time of AmigaOS 3. The synthesizer device had the ability to return height and width data to a running program, intended, I'm sure, to allow the program to animate a mouth to match the syllables.

Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/

Vote

R

Robert Riches 8 years ago

To be fair, IIUC, at least the early Amigas had some special-purpose hardware assistance (the Copper and a couple of other chips) rather than doing _everything_ in software. My older daughter was a little non-plussed that it pronounced her name Jen-knife-er when doing from plain text to speech.

Interestingly, I was thinking just this morning about the early Amigas' ability to synthesize music (four-voice, IIRC) mixed with pre-recorded vocals off a ~1.8-2.0MB floppy.

Robert Riches spamtrap42@jacob21819.net (Yes, that is one of my email addresses.)

Vote

M

Michael J. Mahon 8 years ago

Well, since we?re exploring the performance threshold for software speech synthesis, the 1MHz Apple II ran SAM (Software Automatic Mouth), which in its first incarnation drove a speaker from an 8-bit DAC, but in later versions was fully software, using ultrasonic PWM to output quite acceptable speech through the 1-bit built-in speaker interface.

Virtually all vintage text-to-speech synthesizers relied upon a text-to-phoneme algorithm created by the Naval Postgraduate School in Monterey, California. It worked reasonably well, allowed for pitch and other ?hints?, and was public domain.

As has been noted, synthesis is relatively easy; analysis is hard. ;-)

-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com

Vote

J

Jan Panteltje 8 years ago

On a sunny day (Sat, 17 Mar 2018 22:04:18 -0400) it happened Dennis Lee Bieber wrote in :

I am running a Pi1B (I think it is, very old, 2 x USB out) and the 'festival' speech synthesizer:

formatting link

to speak alarm messages for a navigation system. The pi also does a whole lot of other stuff at the same time:

formatting link

Have not got a pi 3 yet, maybe waiting for the 3.141593 as it will be more accurate.

Vote

M

Martin Gregorie 8 years ago

I remember seeing a speech synthesis package on a 6800 system running FLEX2. The audio card was relatively simple and installed on its SS-30 peripheral bus. This board could be driven from a BASIC program. Details of the hardware are hazy now, but I remember that you had to misspell some words for them to be pronounced correctly.

Martin | martin at Gregorie | gregorie dot org

Vote

D

Dennis Lee Bieber 8 years ago

On 18 Mar 2018 03:14:41 GMT, Robert Riches declaimed the following:

Mostly for the graphics -- the blitter could handle moving sprites and overlaying windows... And DMA operations. I believe the main CPU still had to set up the sound forms.

I'd rounded to 8MHz but in reality, the Amiga base was closer to 7.2MHz

-- but the chips handling the display allowed it to achieve better performance than the 8MHz Mac of the time (since the Mac processor had to handle the display scan, user code only got dedicated time during retrace intervals).

Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/

Vote

D

Dave Higton 8 years ago

Gosh, that'll be TSC eXtended BASIC, presumably?

Dave

Vote

C

Charlie Gibbs 8 years ago

7.16 MHz, to be exact. This was twice the 3.58MHz color burst subcarrier frequency used in NTSC video, which gave the Amiga a leg up in video processing. Amigas were often used in local cable stations for things like public service announcements, and the Video Toaster enabled the Amiga to create all sorts of video effects which were quite impressive, especially for the time. Todd Rundgren's music video "Change Myself" was created with a bank of ten Amigas.

The quirks in the Amiga's voice synthesis system were quite amusing, especially here in the city of Van-cow-ver.

/~\ cgibbs@kltpzyxm.invalid (Charlie Gibbs) \ / I'm really at ac.dekanfrus if you read it the right way. X Top-posted messages will probably be ignored. See RFC1855. / \ HTML will DEFINITELY be ignored. Join the ASCII ribbon campaign!

Vote

M

Martin Gregorie 8 years ago

Can't remember. It wasn't my system.

I only had FLEX09 and whatever the standard BASIC was, but I didn't use that much. Almost everything I did under FLEX09 was written with the excellent Windrush MACE assembler and PL/9 compiler. However I might have used the TSC assembler when I wrote my 4KB monitor, first for a 24x80 screen and then when I swapped floppy disk controllers.

Martin | martin at Gregorie | gregorie dot org

Vote

D

druck 8 years ago

I should not forget to mention Jonathon Duddington's espeak, developed on RISC OS & ARM back in 1995, and available on all popular Linux distros.

---druck

Vote

OT(but related):Alexa

Join the Discussion

Didn't find your answer?