Faster than TI C2000?

C

Chris Carlen 18 years ago

Hi:

Even though I am able to do what I want with the existing 150MHz TMS320F2812 digital signal controller (DSC), I am always eager to have more speed, since applications that were otherwise inconceivable may become possible if more speed becomes available.

Is it likely that there will be further speed advances with this architecture?

Most microprocessors running at 100 MHz or greater speeds use cache. The C2000 doesn't, but rather has small segments of "SARAM" which can run at the full core speed. The slower flash tops out at effectively about 90 MIPS.

Even if the flash can't be made faster, wouldn't it be useful to have a faster core and faster SARAM able to keep up with the core, even if the flash remains at it's present speed limit?

For instance, would folks be interested in a 300MHz device with 16-32k of data+code space capable of running no wait states at a full 300MHz, where the larger 256-512k flash requires wait states making it only effectively 75-90MHz?

This would definitely expand the horizons for my applications, where a few interrupt service routines and computations have to be performed very fast, while a substantially larger blob of user interface code would be satisifed with a few 10s of MHz.

I'm also curious about the relative pros/cons of an architecture using cache that can achieve the higher speeds (like 300MHz) vs. a direct memory architecture like the C2000. Would the introduction of non-deterministic behavior typical of cache architectures make cache undesirable for the types of applications C2000 is geared towards--real time SMPS and motor control? Would that fact be a reason why TI might do exactly as I am picturing and advance the speed of the core and SARAMs even if the flash remains at its present speed?

Where do you see TI going next with C2000, and is it likely any other vendor will produce anything similar?

I took another wander around yesterday for similar DSCs with boatloads of peripherals such as the waveform generators, QEP interfaces, etc. that the C2000 has, which anything comparable seems to be found only on the Freescale DSCs. But Freescale are 16 bit and very slow compared to C2000. I also looked at Microchip's PIC32 and ST ARM/Cortex. It seems like C2000 towers above anything else in the market, and this has remained so for several years.

The F2812 remains, after 4 years, one of the ultimate microcontrollers on steroids!

Your input is of interest.

Good day! ____________________________________ CRC crobcREMOVETHIS@BOGUSsbcglobal.net NOTE, delete texts: "REMOVETHIS" and "BOGUS" from email address to reply.

Vote

J

Jim Granville 18 years ago

Did you look at the ADI BlackFin ? - or some of Infineon's top-end parts ?

ADi did have FLASH DSC for a brief time, but I think they saw the 'flash-barrier' looming, and went instead to a fast ram based design in their BlackFin series.(which I think now hits 400-600MHz)

Overall you are right, fast embedded controllers have rather stalled at the 'flash-barrier'. Above 100MHz is quite rare, and those that are (like a couple of ARM9's) use slow flash, and a cache to do so. (so you will hit the eratic cache issues)

Perhaps the future is Multi-core devices ?

Multiple uC is already quite common.

-jg

Vote

V

Vladimir Vassilevsky 18 years ago

What kind of application do you need the high speed for?

Unlikely.

It is actually slower then that; 90 MIPS is the ideal case. The realistic figure would be 70...80 MIPS. However the critical code can be loaded in SARAM as you noted.

This is what L1 memory is for.

This is BlackFin.

For the faster CPUs and DSPs, the limiting factor is the bus and the peripheral speed rather then the core speed.

It depends. The question is what speed and what latency do you really need.

TI will probably come up with some sort of click and drag programming environment (like LabWiew) which will generate the code for 28xx. The floating point 28xx parts seem to be designed for that.

TMS 28xx is not very fast, if compared to DSPIC or FreeScale 56xxx, or especially to BlackFin. It requires the dual power source with sequensing, it is power hungry, and it has only 3.3V I/O which is inconvenient for the control applications. The flash programming procedure is slow to ridicule. I have used 28xx in some projects, and I am not overly excited about it.

Huh? Same sort of crap like all other MCUs. The most sensible vendor seems to be FreeScale, but they are crap, too :)

Here is a bad thing about TMS28xx: not too long ago TI switched the silicon revision from "F" to "G". This introduced the incompatibility in the flash writing subroutines, and required the change of the software and the production procedures.

I don't understand your point.

Usually, the people are looking for a microcontroller which is suitable for the particular application. You seem to look for an application which is suitable for the microcontroller.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Vote

V

Vladimir Vassilevsky 18 years ago

BlackFin core runs at 600MHz, however the peripheral clock speed is

133MHz max. Hence all peripheral activities, interrupt latencies, bit banging and such are limited to that clock.

Perhaps there is no real demand for the faster speed?

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Vote

C

CC 18 years ago

At present I am generating 1us resolution arbitrary pulse sequences using compare matches. There must be no jitter so HW compare match to generate the edges is essential. I might need to make a waveform like:

START delay 4us high 15us low 12us high 30us low 60us high 10us low 14us ... STOP

Then on the next trigger the sequence might be totally different. This is related to research in internal combustion engines. These are fule injection pulses. No this is not production research, but fundamental science.

Party pooper.

Yes, that is so.

I am aware of much faster devices. What C2000 has is *peripherals*. I can get things done with this chip that would take much longer with a more microprocessor-like device, which would force me to integrate logic in an FPGA to do the highly symbiotic hardware+software tasks.

Not really. If you need to get into an ISR, compute some setup parameters for some hardware, then get out fast enough for the hardware to do it's think, like generate another compare match, then having more CPU cycles per peripheral event is still an improvement.

Well that's an interesting possibility. Though there are already at least two products of this sort available already.

I don't see how you can consider it slow compared to 40MHz 16 bit. It is way faster. It may have more cycles to go through to get into an ISR, but it will get done with the ISR faster, so still will outperform something at 40MHz.

Blackfin is a totally different animal. It's not a DSC. It doesn't have 4 waveform generation timers, QEP inputs, etc. It has lots of nice ways to get data in and out of the chip, but it is essentially a microprocessor. It has less GPIO than F2812 without going to a 400ball BGA. There are only 2 devices with LQFPs.

It requires the dual power source with

Yeah, it's not meant for cell phones and mp3 players. I work in a lab. I have to do many things, design electronics, fix lasers, align OPOs, and make lab instruments.

Yeah, probably crummy for production minded folks.

I work in a lab. I chose the F2812 because the power/complexity ratio seemed better than anything else when it came out. It still seems that way. There are a few options for more MHz, but all are far more complex just to get the CPU working, with memory controllers, the need for external non-volatile, etc.

I appreciate your response.

Yes, that is true. I chose to experiment with the F2812 with no application, but simply had read about it in comparison to DSPics, ARMs, SHARCs, and the new Blackfin, and it seemed like a good step up from the AVRs that I was using for little things, yet easy enough to understand and get to do some intersting things without having to first understand a great deal of supporting infrastructure of setting up MMUs, OSes, etc., or having to build PWM generators and QEP decoders in an FPGA.

Good day!

_____________________ CC snipped-for-privacy@REMOVE-THIS.sbcglobal.net SuSE 10.3 Linux 2.6.19

Vote

C

CC 18 years ago

I am interested in SHARC and Blackfin. Well, actually I'm interested in everything!

Actually, I probably have more use for more peripherals than CPU speed, but a little more speed would be valuable.

I could almost make use of 2xF2812 just to have more PWMs for waveform generation in my present project.

Thanks for the input.

Good day!

_____________________ CC crobc@REMOVE-THIS.sbcglobal.net SuSE 10.3 Linux 2.6.19

Vote

J

Jonathan Kirwan 18 years ago

I just thought I'd mention the possibility of using a processor that has a fixed (only one possibility) delay for timer interrupts relative to the instruction cycle. The ADSP-218x (and some earlier ones, can't say about the Blackfin), if you set things up carefully, can be entirely relied upon for fixed delay intervals. Since all instructions are one-cycle, it's very clean.

Just a thought, Jon

Vote

J

Jim Granville 18 years ago

A good 'hard real time' example :)

Some uC have FIFO/DMA/queued peripherals, that might solve this. Did you look at Freescale's TPU, for example ? The xmega has some sort of DMA/IO handling, but it may not be up to this level of task.

This is the sort of peripheral & time contraint, that lends itself to a small FPGA / Large CPLD.

You would then choose a MCU, with a FIFO/DMA based SSC or SPI (and those are relatively common), or map it as dualport Ram, or even stream it out of a Quad Width SerialFLASH memory, if you want if even more 'hard coded'

If you are doing research, a small companion FPGA board would be obvious. That can also capture info, with very precise time stamps as well.

We have done a numbers of designs, where programmable logic 'extended' the peripheral set of a uC. Makes the uC choice easier.

-jg

Vote

P

Paul Keinanen 18 years ago

How far in advance do you know the sequence ?

Would't it be simple to just use one shift register and preload the pattern and clock it out at 1 MHz ? Some synchronous serial controller (USRT) might also do the trick, provided that you can run it in raw mode without preamble, bit stuffing and CRC.

With two shift registers, one could contain the high bits and the other the low bits, making it easy to implement the start delay, when neither is active.

The other alternative would be to use two loadable synchronous down counters, one to count the high period and one to count the low period (and a third for the start delay) driven by a 1 MHz clock. Initially, load two counters and when the first counter expires, enabling the second counter and simultaneously generating an interrupt, which then reloads the first counter.

The only requirement is that the interrupt latency is less than the count time of the second counter so that it does not expire while the first one is being reloaded. At the next half cycle, the roles are interchanged.

Paul

Vote

V

Vladimir Vassilevsky 18 years ago

This is the traditional application for the state machines implemented in the hardware, such as PLD or FPGA. Some MCUs provide the programmable sequencers for that (68HC16, for example). It can be also done with the use of the DMA channels.

With BlackFin, you can have a pure software solution with the interrupt driven bit banging. The timing jutter could be less then 100ns, and the generation of the pulses of the minimum length of 1us with the resolution of ~10ns is no problem. The other option is DMAing directly to the port, with no jitter at all.

Yes, the timer event manager subsystem of TMS 28xx is impressive.

Not quite so. TMS 28xx is not a DSP, it is MCU. It is quite inefficient on the DSP operations like FIRs or IIRs.

BlackFin is fast MCU with good DSP capabilities.

The 12mm BGA is nice small package. The only problem is the need for 4+ layer board. BlackFin in LQFP can be put on the two layer board.

Good point about power/complexity ratio. I agree, F28xx is good by this parameter.

Why didn't you buy an arb generator or timer board from NI then? It could save you a lot of effort.

VLV

Vote

P

Paul Keinanen 18 years ago

The SCC transparent mode on the QUICC coprocessor on MC68360/MPC860 etc. should also be able to do it with four buffer descriptors (BDs) clocked at 1 MHz.

1st BD points to a long buffer of all 0x00 (LOW) 2nd BD points to a single byte (UP) 3rd BD points to a long buffer of all 0xFF (HIGH) 4th BD points to a single byte (DOWN)

The bulk of the high and low period can be adjusted by modifying the byte count in the BD. The exact transition within 8 bits can adjusted by writing the UP and DOWN bytes. The sequence is repeated as long as each BD is re-enabled after use. If running at 8 MHz serial clock, only 1st and 3rd BD would be required.

There is also an external input to start the sequence, in order to implement the start delay.

Paul

Vote

M

MK 18 years ago

Hello Chris,

Have you looked at Xmos

formatting link

- currently not quite out of the vapourware category but their approach might work for you.

(I think for your pulse generators you would do better with an FPGA and a lower performance micro-controller.)

Michael Kellett

formatting link

Vote

C

Chris Carlen 18 years ago

Thanks for the input.

The project is of course more complex than just the pulse sequence generation. In addition, the F2812 reads a QEP encoder. It generates a

16-bit (might upgrade to 32-bit) word every time the engine shaft angle matches some setpoint. After outputing that word, it resets the compare match value and an index to the waveform table. So basically, this is an angle domain waveform generation scheme, with maximum compare match rate of 86.4-345.6kHz, depending on whether I let them set events to the full resolution of the encoder (the quadrature states), or only the quadrature cycles. We have 1/4 degree encoders, and may top out at 3600RPM. It is Ok to have a few 100ns of jitter on the angular output, so a softwear write to ports is used.

One of the output bits of the angular waveform is also a trigger for the time domain pulse sequence. The way I do this on the F2812 is very simple and I have already proven it to work well. One EVM timer outputs

1MHz. I can also reset this timer so I can reset the phase of this timebase when the pulse gen is triggered.

A second EVM timer is clocked by the 1MHz, and the compare match is set to the initial time delay desired to 1us resolution. The compare match generates the output rising transition, so this occurs perfectly jitter free. The compare match interrupt then adjusts the timer value so that overflow will occur in the desired number of microseconds for the high period. At overflow, the output edge falls (by hardware) and the overflow (actually I use the underflow interrupt) ISR sets the compare match value for the rising edge once again to the new low delay period value.

Thus, all edges are generated directly by the PWM hardware for no jitter. All the CPU does is interrupt after each transition and set up the next compare match/timer value.

The real-time challenge is that I must ensure that in the worst case that the sum of the times for both a pulse generator interrupt and an angular compare match interrupt complete in less than 1us. But the actual work done in the interrupts is relatively little so this is fairly comfortable at 150MHz. I'm using straight unoptimized C so far.

There is another complication. Every engine cycle will update a state sequencing algorithm which will choose the next angular and temporal waveform tables. We run optical research engines that may only be fired once every ten cycles. So 10 cycles may have no fuel injection, then the 10th fires. But all the while a 10Hz YAG laser must be kept triggered, and some other junk. Actually, the state sequences will be more complicated than this, and I plan to allow digital inputs to affect the state transistions as well.

Plus, we want to be able to change the timing of angular events on the fly, so a comm interface will allow loading new data tables, and the state sequencer will link them in at a safe time. That's why I've been working on a RS232 SCI driver, which is now done! Later we will upgrade to USB, but RS232 is simple and just fast enough so that's where we'll start.

Did I mention the plot thickens?

The thing must also be able to run in a simulate mode, in which it outputs instead of inputs QEP encoder signals, while still responding to the synthesized QEP encoder to generate all the output waveforms. This is so the experiment can be exercised with "live" engine signals without running the engine (an optical engine requires a lot of effort to run) and to keep the laser happy when not running. There is another gadget I built with a PIC (one of my first and only meaningful PIC projects before adopting AVR) that syncs the laser when transitioning between simulate and run modes, so the laser always sees triggers within it's acceptable frequency range.

Some labs may also have an absolute encoder to check against the QEP to ensure no slippage, which can have catastrophic results.

I plan to integrate a medium FPGA or large CPLD into the system which will do the QEP simulation, and may allow me to experiment with alternative implementations of the temporal pulse generation (I might need a second channel of this which is too much for the F2812), as well as a bunch of other little glue needed for triggering our DAQ system, etc. There is also a wierdness about the F2812's handling of the QEP index pulse that will be fixed up by the PLD.

So there is quite a bit to do, and my evaluation of the F2812 about 3 years ago indicated that my time pulse generation and angular waveform generation work as I had expected. So there is no need to consider completely different platforms. But I don't like to be painted into a corner so the FPGA will give me a way out if some future requests by the scientists exceed the capability of the F2812. Actually the 1us pulse timing resolution is overkill and sort of a challenge that I made for myself. But it can be done on the F2812.

Notice that this really utilizes the EVM capabilities almost to the max. I think I have a use for the 4th timer as well. Oh, I think it generates the QEP sim clock.

It is the EVM that drove me to the F2812. But even another chip like the Freescale's with TPUs but without 150MHz CPU might not be enough.

I also built a dual encoder simulator out of the F2812 that fully simulates the QEP and absolute encoders that will be on the engines, with all sorts of neat adjustments via a terminal interface and ability to single step, slip, etc. so I can develop and test this code in greater detail than by using a mechanical encoder turning jig. That was my first F2812 practice project.

Good day! ____________________________________ CRC crobcREMOVETHIS@BOGUSsbcglobal.net NOTE, delete texts: "REMOVETHIS" and "BOGUS" from email address to reply.

Vote

C

Chris Carlen 18 years ago

Thanks for the reply Jon.

The F2812 pipeline definitely induces jitter in interrupt latency.

I explained my scheme in detail in response to Jim Granville.

Good day! ____________________________________ CRC crobcREMOVETHIS@BOGUSsbcglobal.net NOTE, delete texts: "REMOVETHIS" and "BOGUS" from email address to reply.

Vote

C

Chris Carlen 18 years ago

Interesting ideas. Thanks. I have explained my scheme in detail in a response to Jim Granville.

Good day! ____________________________________ CRC crobcREMOVETHIS@BOGUSsbcglobal.net NOTE, delete texts: "REMOVETHIS" and "BOGUS" from email address to reply.

Vote

C

Chris Carlen 18 years ago

Hmm. I hadn't though of the possibility that a DMA could do this, and do multiple channels as well.

You agreeing with something is equally impressive! ;-)

Even compared to a dsPIC? C2000 does have MACs and the ability to read and index pointers in one cycle, multiple busses for read, all that DSP stuff, you know.

And an outside contractor to stick it and X-ray.

Thanks.

It is actually more complicated than just pulse generation, as I explained in detail in a response to Jim Granville if you're interested.

Yes, we could do eveything with LabView and NI stuff. But we have to do

8 labs. Racks of NI hardware and likely LabVIEW RT/FPGA would add up to $$$. We have a strange situation with costs, where sometimes it is more desireable to build it and spend the money on labor, since I am already here. Whereas capital equipment is an added expense. Since this doesn't have to be done in a hurry, this works.

Plus, our main computer programmer hates LabVIEW, and I'm not too fond of it myeslf, liking to forage around in the peripheral registers of MCUs. He's adopting Python for our DAQ code where possible, and C where not. He'll provide a GUI and command language parser for this thing that I'm building.

Good day! ____________________________________ CRC crobcREMOVETHIS@BOGUSsbcglobal.net NOTE, delete texts: "REMOVETHIS" and "BOGUS" from email address to reply.

Vote

D

dalai lamah 18 years ago

Un bel giorno Chris Carlen digitò:

[F28xx]

Not really, they've already moved on with TMS320F283xx family (300 MFLOPS) and TMS320F282xx family.

F28xx is a DSP, therefore you need to think about DSP-centric tasks. Take for example the typical multiply-and-accumulate algorithms like a digital filter or a FFT: a CPU that maximizes the memory bandwidth and executes one complex instruction for each memory access (thanks to pipelining) can reach the same (or higher) MMAC performance than a CPU with a higher clock, but that needs more cycles to execute instructions and has a clogged memory bus.

They already did it, see for example Blackfin or DSP56xxx, like someone already suggested. Or use a FPGA.

emboliaschizoide.splinder.com

Vote

D

dalai lamah 18 years ago

Un bel giorno Vladimir Vassilevsky digitò:

DSPIC is a toy compared to F28xx, it's a 16-bit device and its top speed is

40 MIPS. DSP56xxx is more likely, if you compare parts with similar prices (e.g. DSP56311 vs TMS320F2812).

F281x family has several weakpoints (no DMA, no auxiliary clock inputs, very poor support for external interrupts, etc...), but not certainly the performances/price ratio. A 14$ microcontroller with 150 MMACs doesn't look so bad to me.

emboliaschizoide.splinder.com

Vote

V

Vladimir Vassilevsky 18 years ago

TMS 28xx is lousy DSP, or not a DSP at all.

I've done the DSP work in TMS28xx assembly, and I can tell some numbers: the 32-bit type I biquad filter with saturation and noise shaping takes the 76 instructions per iteration not including the loop and the entry/exit overhead. On the normal DSP like BlackFin, it takes 18 commands.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Vote

C

Chris Carlen 18 years ago

Blackfin is media oriented (and hence low power), C2000 and the 56xxx stuff is power conversion oriented. That's the real key difference.

If one can do a task with a MCU vs. an FPGA then is it pretty clear which one's easier and therefore cheaper.

For me Blackfin+FPGA is 10x more difficult than F2812. It is the EVMs which make it possible to do what I want without an FPGA.

Good day! ____________________________________ CRC crobcREMOVETHIS@BOGUSsbcglobal.net NOTE, delete texts: "REMOVETHIS" and "BOGUS" from email address to reply.

Vote

Faster than TI C2000?

Join the Discussion

Didn't find your answer?