CPU selection

For a sensor application, I need to select a CPU able to:

- wake-up from power down;

- sample a sound signal for at least 20 sec;

- process the signal in time and frequency domain (at least, FIR and FFT);

- send some result on a serial port;

- fall back to sleep for at least 2 min.

The application should be solar powered, so it need to be very very low power.

I have considered some 8 bit processors, in particular Atmel ATMega128L. This processors offer very good power down current, but awful processing performance. This means that, for instante, an FFT require more time and, in consequence, more power.

Other choice: a low end DSP. I have considered:

- TI - TMS320C2407A;

- Analog Devices - ADSP21990;

- Microchip - DSPic 30F3012; First two DSPs offer a lot of processing power, but need about 200uA in power down. Last one is, in a certain sense, a middle way.

Finally, I've considered a 32-bit processor: Intel PXA270 (Xscale). Good processing, but it need a lot of power to enter and exit power down. And it has a lot of useless peripheral which, in the end, use power. Finally, von Neumann architecture should require bus contemption between data and code, reducing power in respect of the above illustrated DSPs.

I've made my mind to Texas Instruments, leaving a possibility for Microchip.

Does anyone have any suggestion to give? Does anyone know some benchmarks to consult (for example, how many time is required for an FIR or an FFT on different platforms)?

Thanks a lot. Best regard.

Reply to
francesco
Loading thread data ...

It is a difficult trade off between idle power used and speed of processing. But with 20 seconds of sample, I would assume your sleep time is much longer than that, so the sleep power is very important. In that case I would suggest a very low power processor such as the TI MSP430. They have very limited RAM, however. You don't say what your RAM requirements are.

You can get around the high sleep power of any device by using an external device to turn power on and off. An RTC chip can do this.

You might look at any of a number of ARM MCU chips. TI has the TMS470, Atmel has the AT91SAM7, Analog Devices has the a family and Philips has the LPC2000 line. These chips can all go much lower than 200 uA in low power modes and should process much faster than the 8 bit MCUs. Check out the ARM Yahoo groups.

If you have other c> For a sensor application, I need to select a CPU able to:

Reply to
rickman

Why not use a 2 stage system splitted in :

- One ultra low consumption stage using a cheap 8 bit mcu (like 8052 familly) to control sound aquisition to a memory storage and handle power and serial. - One higher consumption stage using any DSP you like that : - powers up during sound aquisition (the mcu trigger this event) - reads into the first stage memory and computes fft and fir fast, then stores the results in the memory - powers down ( sending a signal to the mcu that results are available )

The mcu will then send results over the serial, and wait for next system triggering event.

I ve heard that there are good solar power management chips, but can't remember any pointer to that info, a little search won't hurt on that subject since it could really be problematic.

I'll love to read more about this project Best regards

Reply to
ricore

I think it is critical to know sample rate and resolution. If we assume 48 kHz and 16 bit mono, you are looking at 2 MByte of Data!!! If you use an 8 kHz 8 bit PCM codec, then you have 160 kB of SRAM.

The best alternative I can think of is the AT91R40008, which incorporates

256 kB of SRAM. No nice peripherals, so you should put a small micro with ADC like an ATmega48 which samples the Audio and sends it over the USART to the AT91. The AT91 runs in idle mode, with the CPU shut down and the UART enabled. The built in DMA can receive up to 64 kB of Audio before the AVR wakes up the ARM so it needs to be active three times during a 160 kB transfer.

If you sample less, and can fit all Audio samples into 64 kB, then the AT91SAM7S256 is a good choice. It can power down everything except a timer, which can periodically trigger an AD conversion. When the AD conversion is complete, the DMA controller will write the sample to internal SRAM. The CPU does not need to be powered up during the ADC process, so it will use very little power.

The SAM7S should draw 26 uA at 32 Khz. You need to use an external LDO to get lowest power consumption.

An FPSLIC (AVR + FPGA) can sample the ADC autonomously and can dor FIR processing in the FPGA portion at high speed.

--
Best Regards
Ulf at atmel dot com
 Click to see the full signature
Reply to
Ulf Samuelsson

This sounds like a 2 device problem : One small uC, for Timing/power verify, and ideally ADC sampling/small buffering, and that wakes up the DSP core, only for the packet-crunching.

Small uC candidates would be SiLabs C8051F (good analog) and TI's MSP430 ( low RTC operation, but only average analog performance )

You will need power budgets for all parts, including Serial and Audio pre-amps - 20 secs : 2 mins is not a huge off-time, so run time powers may dominate.

The DSp vendors will have some info, but you will need to decide the precision that matters to you.

Reply to
Jim Granville

Dear Francesco.

Don't look any further! MSP430!!!!!

16 bit power, 12bit A/D resolution, lots of peripherals and very, very low power with a 16bit by 16 bit multiplier unit.

Best regards.

Regis.

Reply to
RS

MSP430F1611 has 48K flash, 10K ram. MSP430F1612 has 55K flash, 5K ram

Reply to
dmm

Because your 'processing' requirements are non-trivial, I would strongly recommend the TI TMS320VC5501/2 DSP. Not so good on sleep current but great on run current, per MIP. You'll do more in 1 MIP with VC5502 than in 30 MIPS with an 8 bitter.

You could easily keep the VC5501 in standby or even power it down. If powered down, it will need to be booted each time which will cost time and current.

I doubt you'll find anything with a lower power/MIPS ratio than these guys. Even with MSP430.

-Andrew M

Reply to
Andrew M

Well,

for power it is the MSP 430 as mentioned several times, for processing power and wider range of memory I would recommend ARM like Ulf did. However I would recommend a lower cost device like the LPC2106 with

128k Flash and 64k SRAM, running from Flash very close to full speed in ARM mode (between 90 and 99% of SRAM speed. With this device you would have 64k SRAM for buffer while running out of flash.

More information here:

formatting link

Great user group with lots of information here:

formatting link

Low cost boards from Olimex or with size limited compiler from IAR

Sleep current typically between 10 and 20 uAs, not as good as 8-bit but MUCH better than the DSP options.

An Schwob

Andrew M wrote:

strongly

great

30 MIPS

time and

guys.

sleep

important.

TI

your

TMS470,

has

low

Check

and

very

require

200uA

(Xscale).

power

use

the

many

Reply to
An Schwob in USA

...

An Schwob > Well,

Which is not enough to store 20 seconds of uncompressed audio. That is why I proposed the AT91R40008 having 256 kb zero waitstate 32 bit SRAM.

Would be curious to know if you could substantiate the claim of 90-99%. A non sequential fetch takes 3 clocks, and sequenctial fetch takes 1 clock. To reach 90% performance you have to execute in average 18 instructions between jumps. (solve the equation : 0,9 = n / (3 + (n - 1))) Even with conditional instructions, this is large...

To reach 99% performance you have to execute in average 198 instructions between jumps. (solve the equation : 0,99 = n / (3 + (n - 1)))

Does Philips have a branch target cache, otherwise there is no chance to meet the claim?

--
Best Regards,
Ulf Samuelsson
 Click to see the full signature
Reply to
Ulf Samuelsson

90-99%.

clock.

instructions

instructions

to

I have discussed this before with LPC2000 proponents and they are never able to support this claim. I guess they hear it and repeat it without considering the validity. The slowdown from branches is very significant since it adds multiple wait states. I think your equations do not fully calculate the negative impact a branch causes. Not only do you have to start a new flash read, but it is random where in the 4 word read execution will start. This can result in more waits to fetch the next 4 words.

I have yet to see anyone actually benchmark any code, flash vs. ram in the LPC2000 parts. Has Atmel provided any numbers for this on the SAM7?

Seems to me like several approaches could be combined. If the OKI parts with 8 kB cache were fitted with 128 bit wide flash it would get as close as possible to the full rate of these CPUs.

I will say that the Flash seems to be the weak link in the chain for all fast MCUs. Until this is properly solved, I guess we can expect ARM MCU performance to be limited to the 60 MHz ballpark.

Reply to
rickman

I've benchmarked our Forth compilers and see about 95% of the no-wait performance. This depends somewhat on the code sequence and the loop alignment - in some small loops aligning the head of the loop to a 16 byte boundary is beneficial. All tests were performed on an LPC2106 set to 60MHz.

Stephen

-- Stephen Pelc, snipped-for-privacy@INVALID.mpeltd.demon.co.uk MicroProcessor Engineering Ltd - More Real, Less Time

133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web:
formatting link
- free VFX Forth downloads
Reply to
Stephen Pelc

Stephen,

this is good data and is totally in line with what we see. There has been no reasonable sized program (>2k) that performed slower than 90% of SRAM speed of what we tested. The typical speed was around 95%, looking at small DSP algorithms we could get close to 99%.

The true benefit of the LPC2000 memory interface shows up when executing ARM mode. While Thumb mode is more compact and should be used for all non-realtime critical program parts, ARM mode is by definition of the architecture faster but gets slowed down by bus limitations in most embedded ARM Flash interfaces if the bus width is less than 64-bit and speed is faster than Flash access time, e.g. faster than 30 MHz. It takes approx 5 ARM instructions to perform the same functions as 7 Thumb instructions. Assuming no bandwith limitation, this is a performance improvement up to 40% from ARM over Thumb. Thumb mode (7x16bit = 112bit) however saves 30% of code space over ARM (5x32bit =

160bit). Getting the best performance / code density trade off with an ARM device, you want to have most of your code in Thumb mode, probably more than 90% but the smaller part of the code that is most real time critical you want to run in ARM mode. With the LPC2000 you can run it from Flash in ARM mode and will still get the average 95% while with other ARM7 implementations it is highly recommended to copy the fast routines into the SRAM and execute from there. This is some hassle with the software structure but well worth the effort getting up to 40% gain in execution speed. Summarizing you can get the more speed out of a 60 MHz LPC2000 running out of Flash than from a 55 MHz other device running out of SRAM. This can save a lot of SRAM and code complexity overhead.

An Schwob

Reply to
An Schwob in USA

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.