66MIPS 8bit microcontroller

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 6:57 PM

... snip ...

If you actually implement such FP operations, you will find that add/subtract is normally the potentially slowest, because of the extensive alignment time. Multiplication is quite efficient, and division is intermediate. Assuming good fundamental integer arithmetic routines.

--
 
 
 
                        cbfalconer at maineline dot net

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 8:00 PM

The FP libraries I wrote for ARM take less than 30 cycles for float add/mul, and divide takes 60. Yes 8-bitters are slow, but few are 50 times slower!

An 8-bit core will need about 4-5 times as many instructions, so as long as you can use mostly 1 or 2 cycle instructions you can stay below 200 cycles. Without a multiply instruction float mul will need up to 400 cycles. Float division can easily be done within 500 cycles as it is as complex as multiply (Jonathan Kirwan wrote a float divide for MSP taking 255 cycles).

Wilco

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 9:09 PM

It has. See the CAP7 and CAP9 These target med-high volume customers, which is one of Atmel's Strengths. Atmel have had Programmable Logic -> Mask Logic services for years.

It gives good revenue/effort, and there IS an opening here.

It's pretty simple to knock over a FPGA on Price(de area) and Icc, all it needs is sufficent volume and design stability.

FPGA vendors will hate these devices, as just as a device ramps nicely, poof, that revenue has gone. Plus, it skews design decisions.

Lots of products qualify (perhaps not many in the embedded space) ST are doing a similar thing.

Correct, and that 'complication' is enough to make the volumes-per-die really struggle to hit critical mass. A lot of engineering, to create a device with a narrow focus, long design-cycles, and little return.

Also remember, this is all RAM based code, in a Niche device, going up against the monster FLASH microcontroller market.

A better design decision is to say "25% of the die CAN be used for a CPU, but we'll let the designer choose how much" - which is what SoftCPUs give designers.

But Triscend & Altera failed here, why would Atmel do any better ?

Neither Xilinx nor Altera offer small devices with CPUs.

Atmel do not have the FPGA resources to compete with the top-end PPC core'd Xilinx devices.

The SoftCPUs have become so 'good' and these tap into ALL FPGA fabric flows, and so leverage ANY process shrink. Lattice have opensource SoftCPUs.

When you look at the revenues, the FPGA market is actually pretty tough. Growths are below the fabless industry average, and they need Ship-Loads of software & engineering.

Internal Politics aside for a moment, what Atmel COULD do, is offer an OpenSource AVR core. Would that generate more ASIC reveue, than a possible small cost in AVR sales ?

-jg

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 9:21 PM

around

indeed.

can easily

wrote

Remember that the MSP430 series have 16-bit data registers. I would expect the MSP430 floating point to take fewer cycles than a true

8-bit processor.

IIRC, cycle times on a floating multiply are dependent on the mix of 1s and 0s in the operands, so the cycle times are not fixed.

Mark Borgerson

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 9:33 PM

around

indeed.

you

Without

can easily

wrote

The MSP430 is an 8MHz beast and many instructions take more than a cycle -- though I tried to stay with short cycle instructions, of course. We also mix "slower" here, sometimes, with the idea of comparing cycle counts. Keep in mind that many of the 8 bit CPUs easily have a higher instruction rate, too. So if we use the term "slower" we need to know whether we are talking about time or cycle counts, as well.

All that said, and to add to your point, it probably will take more cycles on an 8-bit processor. Some of the reason has to do with available registers, too. It helps to have some register space to work with, in cases where instructions working with registers are faster than those using memory and where spills are required.

Not in the case of code I wrote that Wilco is discussing. The variations are something like +/- 1 cycle.

Jon

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Jul 28, 2007 10:10 AM

around

indeed.

you

Without

can easily

wrote

24-bit arithmetic takes 2 instructions on a 16-bit CPU and 3 on an 8-bit CPU, so you'd expect fewer than 50% extra instructions. I used a factor of 2 to be safe, so division within 500 cycles is easy, and 400 should be feasible on some 8-bit CPUs.

Cycle timings of few functions are fixed, that was why I was quoting worst case numbers. Floating point has a large number of special cases, most of which are much faster than the worst case.

Wilco

- S
- steve
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Jul 29, 2007 1:18 AM

with/without the hardware IEEE on the same

expect slightly better...

= 55

text -

200 cycles is about right for add, substract and multiply, see Keil's Complier peformance benchmarks for the 8051

formatting link