Hi, I better introduce myself, my name is Alex Tessarolo, I am the C200 product line architect so this way you know that I am totally unbiased ;-)
I would like to comment on the CLA and power consumption topics brough up.
The CLA is actually fully programmble FPU. More details will be mad available over time.
Initially we will support the CLA with TI supplied algorithms (kind o piece together software building blocks). Thats where the "hardcoded terminology came in (maybe not be the most appropriate terminology). The we will support full programmability with appropraite tools. Ful programmability may come out in conjunction with the hardcoded blocks o soon thereafter, we are still working out details.
We are using this approach because we have users, one example: thos transitioning from analog to digital power supplies, that have a stee learning curve and hence we want to make the initial step as easy a possible.
The CLA itself is essentially a stripped down version of the FPU on ou F2833x devices and made it work independently of the C28 CPU. It ca directly access peripherals such as the ADC and PWM and respond t interrupts directly without CPU intervention. You could almost say that th Piccolo devices, with CLA, are dual core devices.
Why the CLA? Two main reasons, improve performance and reduce power. Th performance we are targeting here is not solely your traditional how man MIPS/MFLOPS but quality of MIPS and response time. In particular, som applications, like digital power, look at "interrupt jitter" and "sample t output delay" as performance parameters. If you have a single CPU handlin the control loops + communication tasks and such, it is very challanging t manage interrupt jitter and respond rapidly to real time tasks. By addin the CLA, it offloads the CPU from performing the time critical tasks an interrupts are serviced faster and more predictably. So the overall syste performance goes up significantly.
We use floating-point? because we get a better quality of MIPS and it just plain easier to program in floating-point. The CLA can exeute (Y M*X+B) in 5 cycles, that includes reading the values from memory an storing the result back to memory. It can do division (i.e. 1.234/13.765
0.0896476) in 11 cycles, again reading the values from memory and writin back and so on. The CLA has about on average a 2x performance advantag over the C28 CPU on math tasks. So even though Piccolo is a 60MHz device with the CLA, the overall system performance is much higher.
On the power aspect, quality of MIPS is not often factored in suc discussions or the energy consumption (average mW over a given time). Fo example: If I am running a control loop on the CLA and it only takes quarter as many cycles to execute, then say on an Cortex-M3 based device then I have two options: I can run Piccolo at say a quarter of the MHz an hence reduce the power or run Piccolo at full speed and turn off the CL for 75% of the time. The CLA in fact automatically shuts down when its tas is complete. Basically, performance and power are interdependent.
We have applications where we do just that and even though the data shee may specify a Max power number, it doesn't factor in the above. S comparing power between devices is complex. Its even more complicated b the fact that power is also measured differently by different devices Running an intense math task will result in different power numbers the running say a Dhrystone benchmark (Dhrystone does not do any math, o insignificant amount). So comparing data sheet power numbers is not alway an apples for apples comparison and this level of detail is not always mad available or apparant.
Anyway, I probably rambled for too long. I hope anyway the above shed some light in our thought process.
Cheers, Alex T.