New sub-$2.00 60MHz Piccolo C2000 Family from Texas Instruments

B

Bill Giovino 17 years ago

Texas Instruments has introduced a new C2000 32-bit microcontroller family:

formatting link

Codenamed Piccolo, it has the performance of a DSP with the ease-of-use of a Microcontroller architecture. It also has an FPU - not bad!

The news article shows a C2000 roadmap

Bill Giovino Executive Editor

formatting link

Vote

A

AZ Nomad 17 years ago

I'd rather eat sand than follow the recommendation of a spammer like you.

Vote

M

MK 17 years ago

Actually the Piccolo does NOT have an FPU - other members of the 2000 family do, but as the roadmap on the TI link shows the Piccolos are in a different path.

It might not be a bad part but it only clocks at 60MHz so I would compare very carefully with low pin count ARMs (especially Cortex parts), some of which have comparable peripherals sets but all of which have much better tools.

The Piccolo should win on pure DSP performance but in applications where power and micro controller features count it won't look so good.

Michael Kellett

Vote

B

Bill Giovino 17 years ago

Hi, Mike, nice to see someone else who remembers FPUs!

The Piccolo's has a peripheral called a Control Law Accelerator (CLA). The CLA has hardcoded algorithms, and the algorithms are executed via an FPU that is part of the CLA.

But the CLA is not designed to be used as a general-purpose FPU without specific help from TI and in my conference call with TI I hammered at that particular point.

Mike, after reading your above comments today, I updated my article with the following text which can be found in the 3rd paragraph:

"A glance at the industry standard Dhrystone benchmarks for this core show that, when compared to a 72MHz ARM Cortex M3, a 60MHz Piccolo performs 25% faster while executing "general purpose" code, and up to 38% faster while executing "control algorithm" (PIC32) code. The Piccolo is, of course, expected to be significantly faster than any conventional microcontroller architecture when executing any sophisticated math code."

I agree - I wouldn't expect any sophisticated math core to surprise me with low power. But low power applications do not appear to be TI's target for the C2000.

However, the I/O configuration on the C2000 is specifically designed to be what you are used to seeing in any conventional microcontroller, so it can easily be used as a general purpose microcontroller (although that might be overkill for a purely I/O-control application).

The low power Cortex core winner is ST's STM32, which draws a ridiculously low

36mA at 72MHz:

formatting link

(Hands down, the STMicroelectronics STM32 appears to be dominating the Cortex marketplace for any high volume application.)

Thanks for your comments,

- Bill Giovino Executive Editor

formatting link

Vote

B

Boo 17 years ago

has

of the

Can you tell me what algorithms exactly are encoded by the CLA ?

Thanks,

Boo

Vote

U

Ulf Samuelsson 17 years ago

[snip]

The following statement may need a review:

"The Piccolo is, of course, expected to be significantly faster than any conventional microcontroller architecture when executing any sophisticated math code."

The AVR32 with it's single cycle MAC is also "significantly faster than any conventional microcontroller architecture" on DSP algorithms.'

Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB

Vote

A

Alex T. 17 years ago

Hi, I better introduce myself, my name is Alex Tessarolo, I am the C200 product line architect so this way you know that I am totally unbiased ;-)

I would like to comment on the CLA and power consumption topics brough up.

The CLA is actually fully programmble FPU. More details will be mad available over time.

Initially we will support the CLA with TI supplied algorithms (kind o piece together software building blocks). Thats where the "hardcoded terminology came in (maybe not be the most appropriate terminology). The we will support full programmability with appropraite tools. Ful programmability may come out in conjunction with the hardcoded blocks o soon thereafter, we are still working out details.

We are using this approach because we have users, one example: thos transitioning from analog to digital power supplies, that have a stee learning curve and hence we want to make the initial step as easy a possible.

The CLA itself is essentially a stripped down version of the FPU on ou F2833x devices and made it work independently of the C28 CPU. It ca directly access peripherals such as the ADC and PWM and respond t interrupts directly without CPU intervention. You could almost say that th Piccolo devices, with CLA, are dual core devices.

Why the CLA? Two main reasons, improve performance and reduce power. Th performance we are targeting here is not solely your traditional how man MIPS/MFLOPS but quality of MIPS and response time. In particular, som applications, like digital power, look at "interrupt jitter" and "sample t output delay" as performance parameters. If you have a single CPU handlin the control loops + communication tasks and such, it is very challanging t manage interrupt jitter and respond rapidly to real time tasks. By addin the CLA, it offloads the CPU from performing the time critical tasks an interrupts are serviced faster and more predictably. So the overall syste performance goes up significantly.

We use floating-point? because we get a better quality of MIPS and it just plain easier to program in floating-point. The CLA can exeute (Y M*X+B) in 5 cycles, that includes reading the values from memory an storing the result back to memory. It can do division (i.e. 1.234/13.765

0.0896476) in 11 cycles, again reading the values from memory and writin back and so on. The CLA has about on average a 2x performance advantag over the C28 CPU on math tasks. So even though Piccolo is a 60MHz device with the CLA, the overall system performance is much higher.

On the power aspect, quality of MIPS is not often factored in suc discussions or the energy consumption (average mW over a given time). Fo example: If I am running a control loop on the CLA and it only takes quarter as many cycles to execute, then say on an Cortex-M3 based device then I have two options: I can run Piccolo at say a quarter of the MHz an hence reduce the power or run Piccolo at full speed and turn off the CL for 75% of the time. The CLA in fact automatically shuts down when its tas is complete. Basically, performance and power are interdependent.

We have applications where we do just that and even though the data shee may specify a Max power number, it doesn't factor in the above. S comparing power between devices is complex. Its even more complicated b the fact that power is also measured differently by different devices Running an intense math task will result in different power numbers the running say a Dhrystone benchmark (Dhrystone does not do any math, o insignificant amount). So comparing data sheet power numbers is not alway an apples for apples comparison and this level of detail is not always mad available or apparant.

Anyway, I probably rambled for too long. I hope anyway the above shed some light in our thought process.

Cheers, Alex T.

Vote

G

Grant Edwards 17 years ago

Is the JTAG support as completely messed and non-standard as the MSP320, or did you guys actually follow the IEEE standard this time around?

s debugging info available for the C2000 so that open-source and third-party tools will work, or is it all top-secret with several sub-families only partially supported like the MSP430?

Grant Edwards grante Yow! Now, let's SEND OUT at for QUICHE!! visi.com

Vote

B

Bill Giovino 17 years ago

"Ulf Samuelsson" wrote... :> The following statement may need a review:

math code."

While a microcontroller with a MAC (Multiply Accumulate Unite) is of course available, that core configuration is not what I would call a "conventional microcontroller".

-Bill.

Vote

R

rickman 17 years ago

)

...snip...

Thanks for the intro. So for the short term, it sounds like the CLA will be used a bit like the PSOC programmable analog is used, only the functions provided by TI are supported. Are the supported CLA functions documented at this time? I think that was the question that Boo was actually asking.

It looks like these parts are still *very* preliminary. Any idea when they will be shipping in volume?

Rick

Vote

J

Jim Granville 17 years ago

math code."

available,

microcontroller".

These days, "conventional microcontroller" means anything single chip, that can be _used_ as a "conventional microcontroller". So, forget legacy ideas about what should, or should not, be counted, and look instead at what the vendors are building, and how they might be used. AVR32 in the Single Chip Flash Models, certainly targets "conventional microcontroller" usage and customers.

-jg

Vote

J

Jim Granville 17 years ago

If this is not ROM based, then I would saw "hardcoded' is certainly the wrong terminology.

Surely it would be smarter to tell users that ? (when the software has caught up, of course ;) )

This sounds more like the TPU scheme of Freescale ?

Q: How much memory resource does the CLA have (Code/Data/Flash/RAM) ?

Q: What Debug access to the CLA and Memory, is available in silicon ?

Q: What is the "interrupt jitter" of the CLA operating some example tasks ?

It seems a novel approach, as usually FPU has the widest latency of any operation, and thus many apps move that outside the tightest interrupts.

Q: How many Phases of PWM can the CLA/Peripherals manage ?

In power designs, above a certain power level, you need to start paralleling Mosfets, and it can make much more sense from a ripple and emc viewpoint, to run a lot of phases. That can slash the ripple currents - so a few cents more on the Silicon, pays back big time in smaller, cheaper capacitors.

Then perhaps the industry needs a better benchmark ? Something like a Joules/operation ?

-jg

Vote

A

Alex T. 17 years ago

All of our C2000 devices use IEEE standard JTAG.

Development kits are available for third parties to develop applications. We have a number of third parties that have and are developing tools for these devices.

Vote

A

Alex T. 17 years ago

The Piccolo F2802x series of devices (without CLA) will be sampling by Nov/Dec this year. The devices are in manufacturing and on their way.

The Piccolo F2803x series of devices (with CLA) will be sampling probably around Mar/April next year. The design for the F2803x with CLA is actually done, been debugged on FPGA as we speak. Adding the test infrastructure, doing the layout and then silicon checkout is actually quite time consuming (and the least fun).

The CLA functions are been worked on. We will be supporting the standard trig type functions (div, sqrt, sin, cos, atan2,..) standard DSP type functions (FIR, IIR,..) standard control functions (PID, PARK, IPARK, State Estimator,...) and we will be porting our existing Motor Control library and Digital Power library.

The CLA will also allow users to write their own functions. Below is a simple example of coding the CLA in assembly:

MMOV32 MR0,@m ; MR0 = m (load value from memory) MMOV32 MR1,@x ; MR1 = x (load value from memory) MMPYF32 MR1,MR1,MR0 ; MR1 = m*x, MR0 = b ||MMOV32 MR0,@b (load value from memory) MADDF32 MR1,MR1,MR0 ; MR1 = m*x + b MMOV32 @y,MR1 ; y = m*x + b (store result to memory)

Cheers, Alex T.

Vote

R

rickman 17 years ago

y

ng

te

e from memory)

value from memory)

e result to memory)

Thanks for the info. To be honest, I would prefer to *not* have to program a DSP like coprocessor in assembly (or at all if possible). I'm not completely clear on how the CLA is to be used, but that's ok. If it won't be in full production for eight months or so, I'll wait before I spend a lot of time looking at it. TI has burned me before on new product introductions saying they were six months out for over a year.

I understand that this part is oriented toward control functions, but is there any plan to include a higher resolution ADC and DAC channels perhaps? I would like to have a single part to use for a variety of apps rather than different specialized devices requiring many different tools. This part with 16 or 18 bit dual ADC and DACs would have a much wider range of apps.

Rick

Vote

A

Alex T. 17 years ago

Kind of, the TPU is probably closer to the HET peripheral found in our TI-ARM based devices. The CLA is closer to a normal CPU/FPU that is tightly coupled to peripherals.

Initial devices (F2803x) will have all RAM memory for CLA. 8Kbytes of RAM for code and 4Kbytes of RAM for data. Future devices will have standard functions in ROM. At boot time the main C28-CPU would copy the code from its Flash to the code RAM of the CLA, configure the CLA and then let it run.

The CLA has its own debug mechanism (run/halt/single step). Through the debugger, the user can run the CLA, view memory and registers while the C28-CPU is halted or running. Even while both the CLA and CPU is running, you can view/modify memory while the system is running.

?

I will use a PI control loop for an output power stage running at high frequency. The interrupt jitter is zero. The CLA, ADC and PWM peripherals are all synchronously clocked with the same clock (60MHz). The PWM triggers the ADC conversion and when the ADC coversion finishes, the CLA is triggered and the CLA then updates the PWM (duty and/or frequency and/or phase). Some benchmark numbers we are getting are:

Sample To Output Delay Effective Control Loop B/W Number Of Phases ADC->CLA(PI)->PWM

433nSec 461KHz 8

The number of phases is basically the number of PI control loops that can be run at 461KHz max.

Agree FPUs tend to have higher latency. We designed the CLA to have single cycle operation for MPY and ADD/SUB operations with the result available in

1 cycle (as compared to most FPUs where there is either a 2 or 3 cycle latency for the result). MPY/ADD operations can also load/store in parallel and we can also convert a 16/32 bit signed or unsigned integer value to a float in a single cycle as we load. Basically we tried to match the latency of a fixed-point machine so the FPU can operate inside the tightest interrupts.

As mentioned above, in theory the CLA running at 60MHz can do 8 phases with 461Khz control loop). Add on top of that that the C28-CPU can also operate in parallel and it can do about 5 phases at 60MHz, then in theory there is enough horsepower to perform ~13 phases at 60MHz.

The first device will have 7 EPWM modules, each module can have independent phase/duty/frequency control and modules can be synced up and phased. Five of these modules will have our High Resolution PWM technnology with 150pSec resolution on both phase/duty and frequency.

So the peripheral will limit the number of phases to 7 (or 5 with High Res).

We do have devices currently available that support up to 16 phases in the peripheral (F28044/45).

Agree. We actually have Digital Power development kits that support multiple phases (up to 10) using existing devices. You can check them out on the TI C2000 web site links.

Agree, but unfortunately I don't think one benchmark could capture all of the variety of applications. Probably need a suite of benchmarks each more specific for given applications. Unfortunately for marketing and data sheet documentation, they like to keep things simple :-)

Cheers, Alex T.

Vote

A

Alex T. 17 years ago

Programming in assembly is one option for the CLA. For some of the applications we are taregting, where every cycle counts, it leaves little option but program in assembly. Although we do support sharing C header files in assembly code which makes it easier to build assembly code within a main C application. The C28-CPU itself supports a C/C++ comipler and users normally build their application in C/C++ and then for the CLA either link in pre-packaged building blocks or write their own.

Other programming models will be supported but cannot give details now.

I can understand your issue with new product announcements. Although in defence of our C2000 group, we have been fairly good at delivering on our promisses.

Yes, we get that a lot. But at what cost? Integrating such technology is being worked on, but you probably wouldn't get it at the price range of Piccolo (at least in the near future).

Vote

R

rickman 17 years ago

I get this response a lot when I ask about improvements to product lines. My question to *you* is "at what cost"? I am not saying that a 16 bit ADC/DAC is needed in the TMS320F28035/27. But a different member with that capability would allow a lot more applications to be implemented with the same set of tools and without moving to a different architecture. Earlier in this thread these parts were being promoted as being useful for a range of applications comparing it to a "conventional microcontroller architecture". If a variety of peripherals are not provided within the family, then it is just another motor control oriented DSP chip, no matter what you do within the CPU.

I read that the sample rate of the 12 bit converter is pretty high. Have any tests been done on oversampling to provide higher bit resolutions? That still doesn't provide DACs though.

Rick

Vote

A

Alex T. 17 years ago

We are working on it, and it would be another member of the C2000 family.

At what cost? its largely dependent on the target application and the ADC/DAC specs required. There is such a wide variety of specs that its almost impossible to build a "generic" device that caters for a wide variety of applications. Just look at the wide variety of discrete ADC devices available. So, I cannot answer that question. What you will probably see in the future is devices with higher resolution ADC's but targeting a specific range of applications.

Yes, we have run oversampling tests on the 12-bit ADC and we have seen about two bits of extra resolution. Beyond that the THD starts to limit what we can achieve.

On the DAC side, not sure what applications you are thinking about, but one area where we have a lot of interest is using the High Resolution PWM to drive Audio speakers directly via D-Class amp. We have an evaluation board available to demo this capability. So that is using High Res PWM to replace what was done with a high resolution DAC in the past.

Vote

S

steve 17 years ago

ly

So the the CLA only has access to its RAM and not the CPU's RAM? And the CLA RAM is basically dualported to the CPU?

Vote

New sub-$2.00 60MHz Piccolo C2000 Family from Texas Instruments

Join the Discussion

Didn't find your answer?