I don't recall the TI designator, but they make some DSP parts that have peripherals like MCUs. I know that some time back, ARM made a push into DSP territory by adding some DSPish instructions to I believe it was the CM3 devices, or maybe CM4.
Anyone here use these crossover devices? What sort of apps? Why did you pick that device over others?
You are maybe thinking of the TMS320F family of DSP/MCU's from TI. These have a traditional DSP-style processor core - 16-bit "char" (no
8-bit byte access at all), gruesome assembly where each instruction does several different things in a single cycle, multiple memory buses for simultaneous accesses, hardware support for cyclic buffers, FFT twiddling, etc. It lets you make very efficient DSP-style algorithms but is a pain for more microcontroller-style control code. The chips have typical microcontroller-style peripherals such as timers, UARTs, CAN controllers, etc.
So they are a hybrid. They are popular for high-temperature electronics, as they are one of the few families of microcontrollers that are available for 175 °C and above.
These days, true DSP's are much less common. On the one side, once FPGA's started having multiplier blocks they could outcompete DSP's in parallel and pipelined MAC-based algorithms, and have much more flexibility for memory and operand organisation. On the other side, microcontrollers and processors gained single-cycle MAC instructions and SIMD instructions, giving them similar performance to DSP's for many algorithms while being far easier to use in other situations. True DSP's are now usually found only in very specialised systems, or so deeply embedded that you never see their programmability (i.e., you buy a "video converter" chip and don't care how its insides work).
The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - MACs in various formats, saturating arithmetic, and 8-bit and 16-bit SIMD instructions (within 32-bit registers). They don't have all the features of DSP's, but they have enough to make common DSP algorithms quite efficient, and ARM provides optimised libraries. The latest Cortex-M55 core has additional vector/SIMD instructions, but I don't know if any microcontrollers are available yet.
As for anyone using them, I think you'll have a very hard job finding anyone who does embedded development with microcontrollers that has /not/ used Cortex-M4 devices. They are everywhere.
And as for why I pick a given device for a given project, it will depend entirely on the project - as well as other projects I have done and other projects other colleagues have done. There are thousands of Cortex-M4 devices available, not including variations of memory sizes, chip packages, or speeds. The common reasons are the same as for any other type of chip - price, support, familiarity, peripherals, package, etc.
The biggest reason for any choice these days, however, is availability - many designs start off by asking what microcontrollers our suppliers have in stock with the given minimum requirements, because we rarely have time to wait for 52 week lead times.
IIRC, branches were also delayed. The later 320's (C30/C40 and on) were all 32-bit (in C: char, int, long int, float, double were all "one byte" which contained 32-bits). And the floating point format wasn't IEEE.
That combination made supporting byte-oriented serial protocols that used IEEE FP extra fun.
The dev tools from TI were a but clunky, but worked OK and were available for Solaris (including the in-circuit emulators).
But, compared to what else was available 20+ years ago, they were damn fast (especially for the price).
Just a word of caution for Rick re this portion. Make sure that a 32 bit accumulator will be enough for what you are doing; it can easily fall short in many cases. "Normal" DSPs have
40 or so bits for this reason; or, you can pick some processor with
64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than the 32 bit integer regs David is mentioning. David said it all, I am just cautioning because this is the kind of "oh shit" factor which comes at the end of the project (a friend once told me of that "oh shit", you either say it at the beginning or at the end :).
I'm not selecting a DSP part. I typically use FPGAs for what I do. Not because they are required for speed, but because they work well and have complete flexibility. I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore. The new design will still use an FPGA. If I need an MCU in the design, it will be a custom design in the FPGA. I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs. Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt. ~600 LUTs for 8 processors running at 20 MIPS each. Not bad.
I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts. So far, the answer has been "no".
I have used a "real" DSP just once, 20+ years ago. The TI 5420, I did our first DSP based MCA module back then. The 5420 had two cores clocked at 100 MHz, some dual access RAM (meaning an address can be accessed twice in one clock cycle) and multiple serial ADC interfaces, *very* flexible ones, allowed me to serially push an (almost) 10Msps 16 bit wide stream sequentially using 3 of these (one had just 1/3 the seed I needed). A CPLD was doing the serialization, the 3 streams were getting into the DSP memory in a large FIFO, in the correct sequence, all this could be just programmed into their serial interfaces. Then one core had just one job, to detect an event and pass it to the other core which would do the filtering etc. processing, there was a nice FIFO connecting the two cores on chip. A decade or so later I did the same - with some more sophistication though - using a 400 MHz power architecture part with DDRAM, single core. The sampling rate was half that of the former version (had been somewhat overkill) and it was all done by the processor using 64 bit FP for the filtering (2 cycles per MAC, was hard to get at that but this is another story, it did work once I figured out how to do it). And this uses up to half the CPU resources under real load so it still manages to maintain the user interface, support VNC over tcp/ip etc. Like David said, with processors getting faster the need for a "real" DSP goes down and down. As for those other, mixed sort of TI DSP/MCU I have no experience, never even needed to consider any of them.
If you say so - I don't remember. (Delayed branches are not uncommon in processors designed for single-cycle instruction throughput - they are also found in several RISC architectures.)
I did not know they were part of the TMS320F family, though I know Texas Instruments made other DSP's with 32-bit "char".
I had enough fun with a byte-oriented UART protocol on a 16-bit TMS320 with very little ram (so little that I could not afford to waste it on unpacked buffers). Combine that with a UART peripheral that didn't actually work correctly (the "receive" flag was never set) and a toolchain with plenty of "undocumented features" (and some barely documented critical non-conformances). I did not pick the device for any other projects.
I don't know exactly how you are defining a "crossover" part. But if it is "a DSP with microcontroller features", then the answer so far is "yes". Both Grant and I have used TMS320F parts - but I would not choose to use one again if I could avoid it. (I can't answer for Grant there.) I have also used a "DSP with microcontroller features" from Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned that at all.
And if you mean "a microcontroller with DSP features", then as I said almost everyone who works with embedded software has used Cortex-M4 devices. I have lost count of the number of different ones I have used (plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP features).
So I don't quite see how you could have interpreted the posts as "no".
Please read the first post in this thread for that.
I was looking for some insight into their experiences with such devices for DSP work, and I'm counting both DSP like MCUs and MCU like DSPs. I don't see in your post that you talk about any particular experience, rather offer a 10,000 foot overview of the state of the market. Thanks for that, but this is not new to me. So your post was pretty much a "no", to me.
I guess I was not quite explicit enough in my initial post. I was asking about specific experiences where a crossover part was chosen for a project with a significant DSP content, which would have required a DSP chip, if these devices were not available.
I am fully aware that MCUs are getting faster and more capable, but that doesn't mean DSPs are not needed. It simply means they are used in other applications that require more horsepower. Sometimes, it's not even the horsepower, but the performance to power consumption ratio. There are application specific DSPs for hearing aids that run on very low power, much better than any MCU could do.
Years ago DSP split into two categories based on the cell phone market. The high performance devices needed their own power plants, but cranked out some serious MIPS/MFLOPS. The much smaller, lower power, fixed point devices gained in speed, without sucking all the juice from mobile batteries, while serving in hand sets. Now the hand sets have dedicated CPU chips with built in DSP sections for the front end processing of cell phones, rather than separate DSP chips.
There's no shortage of DSP cores in the world, we just don't see all of them because they are part of system chips.
Ah, I overlooked the "F" in your original post. I don't remember any F parts. Interestingly the Wikipedia page on TMS320 doesn't mention the F parts at all. I did find this page abouit the TMS320F28335, but it's a 32-bit part also:
I've done some audio stuff on ordinary CPU's, that in an embedded system would probably go on something like a Cortex M4, if that's what you call a crossover part. The next thing after that is probably a GPU or FPGA, either of which contains a stupendous amount of parallel MAC's. As others have said, dedicated DSP's are now pretty niche.
FPGA's may have displaced general purpose processors for some realtime applications as well, since you get low latency and deterministic timing without having to go crazy worrying about caches and interrupts.
I didn't personally work on it, but spent a while studying a cryptography app that ran on the now ancient Motorola DSP 56000 series. The model number came from the architecture's 24 bit words and 56 bit MAC accumulator. The app wasn't particularly connected with realtime or with signal processing. Rather, the 24*24->56 MAC came in handy for high precision arithmetic used by the crypto algorithm.
At that time there were generally, 16 bit fixed point DSP, and 32 bit floating point DSP. Neither was appropriate for audio work. 16 bits is not enough resolution for high quality audio and 32 bit floating point was overkill, using extra power and burning extra dollars. Motorola came out with 24 bit devices as the sweet spot for high quality audio work.
I did. That's why I said I don't know exactly how you are defining your personal meaning of "crossover part". But I see you've given more information below, so maybe people can give you more helpful feedback (or at least say that they don't have the relevant experience).
Of course it is an overview. Do you want detailed information about everything I have done for the past 15 years or so since Cortex-M devices took over the embedded world?
I can give a bit more insight into my experience with the TI320F24x device. That was over 20 years ago, and lots will have changed since then. The device was horrible to use. The assembly was impenetrable, and extremely complicated to do well. The C compiler was hopelessly inefficient, meaning you /had/ to use assembly for critical parts. The hardware debugging tools were absurdly overpriced (some $1500 for what was basically a couple of 74-series logic chips), and broke easily. The software tools had annoying quirks. But the sensorless BLDC motor control worked well in the end.
I would not willingly choose to do development on these parts again - there are simply too many alternatives that are vastly easier to work with for most purposes. But I know TI sell various pre-programmed parts as dedicated motor control peripherals, and I'd be quite happy to consider them.
As I said, the great majority of embedded microcontroller work is now done with Cortex-M microcontrollers - they dominate the industry. At the low end you have Cortex-M0 and M0+ devices for the very cheapest, but the most popular are M3 or M4 parts (and the M7 at the high end). The M4 is like an M3 but with added "DSP" instructions - MAC's of various types, simple SIMD, saturating arithmetic. In reality, relatively few people actually do anything that could be called "DSP" work - it's usually more general control code. And when you want a digital filter or FFT, you typically use ARM's optimised libraries. Your code runs the same whether the device has DSP optimisation instructions or not - only the speed is different.
So when you ask about "experience using these devices", you are really asking "experience doing microcontroller development".
That is a different question, and more specific.
I've only done quite limited DSP algorithms (such as simple filters) in my own code, and these devices are absolutely fine for that. As always, you have to be careful about your scalings when working with fixed-point numbers.
If you want floating point, some Cortex-M4 have single-precision floating point (Cortex-M4F). You need to be careful to avoid accidentally using double precision operations in your C code - there are gcc flags to help warn you about this. If you want double precision, it's worth going for an M7 microntroller like an NXP RT10xx device (ironically called a "crossover microcontroller" by NXP), since these have double precision floating point in hardware.
I have been involved in a project that was more relevant, using wavelet transformations, but I did not work directly on the wavelet code. I did help out on some of the optimising and translation from the original code (from a PC). Working that way is not optimal, but it was good enough - we required a certain amount of transformations per second, and got that from the chip we had on the board, and did not see any point in going further.
There is no doubt that dedicated DSP cores have instruction types and features that can make a significant difference to the efficiency of some algorithms. A good DSP can do "x += *p++ * *q++;" in a single operation, once per cycle. They generally support cyclic buffers directly, which can save a fair bit of code. And they have the specialised bit manipulation instructions useful for FFT's.
However, it is all about getting the results out in the time (and power and cost budget) you need. And if your code runs fast enough on the device you have, it really doesn't matter if a different device could do it faster.
A lot of the choice will, as so often, come down to experience and familiarity. Getting decent DSP algorithm performance from an M4 is not too hard if you are already a good embedded programmer. It comes down to knowing your toolchain, knowing how to write efficient code, and knowing how to work with vendor's libraries. And since you have good toolchains, easy and cheap debugging (usually), and peripherals such as serial ports, USB, and Ethernet, you often have a much nicer development environment. If you develop appropriately, the same code will also compile directly on a PC making simulation and testing vastly easier.
On a DSP, getting optimal performance is very difficult - there is a /lot/ you need to track, and you are often making use of so many compiler extensions, intrinsics, etc., that you are really programming in assembly. Getting the same code running on a PC for testing is hugely harder. Accidentally getting significantly poorer efficiency is very easy - you might find that writing "while (--n)" gives you extremely fast specialised loop modes while "while (n--)" gives you explicit decrements, comparisons and jumps. Toolchains are often poor quality and very expensive (that is not universal, however). And non-DSP code is much harder than in a microcontroller - you often don't have access to 8-bit bytes, and portability between the DSP and other processors is poor.
We haven't talked much about peripherals or hardware, but DSP's usually have fewer "general" peripherals, and their interfaces can be more specialised.
Yes, that is correct.
DSP's are still very much an important technology, but they are getting more niche. There are few people that develop with them - the majority of companies that have a DSP on their boards will buy the code ready made, often just as a binary blob or pre-programmed. In many cases, the code is written by the companies that develop the DSP.
This is not just because getting maximal efficiency from a DSP is technically hard and requires knowledge and experience (and if you don't need maximal efficiency, why are you bothering with the DSP in the first place?). IP and patent licensing is a nightmare in many of the applications where DSPs really shine, such as in audio and video codecs. If you are Sony or Sonos, you can afford a big development team and an even bigger lawyer team and make your own audio codecs. For most companies, it is a fraction of the overall price if you buy your DSP's with licenses for codec binary blobs all in one.
Standalone DSP chips are also getting rarer - it is more common to see them as accelerators alongside a "host" processor that handles the non-DSP functionality, all within the same die.
Most (in terms of numerical quantities) are probably generated specifically for the ASIC or dedicated chip they are used in. There are parametrized DSP cores available that are often used with 24-bit or
18-bit "bytes" - TMS320's with 16-bit or 32-bit "char" are programmer-friendly in comparison. And sometimes it is not easy to draw the line between hardware filters with very programmable state machines, and limited DSPs.
But a lot is changing. At the high end, processors with SIMD are able to do many of the tasks that DSP's used to do. Other kinds of accelerators such as found in graphics card cores can do a better job than traditional DSPs, while also being easier to work with. At the lower end, normal microcontrollers, possibly augmented with a few DSP-friendly instructions, can do a better job. For your hearing aids, when you have a Cortex-M device that takes less power than the leakage current of the smallest battery while doing all the filtering fast enough, the DSP has lost its advantage.
Yes. There are many manufacturers of 24-bit DSPs, and they almost all have a background in audio.
Motorola (then Freescale, now NXP) also had a peripheral they called the TPU (Timer Processing Unit), found in microcontrollers aimed at engine control and advanced motor control usage. The original version was
16-bit and programmed in a weird kind of assembly, but the later versions were 24-bit and had a specialised C compiler. It turns out that 16 bits is often not quite enough for many high resolution timing tasks, and again 32-bit would have been overkill.
(Now, of course, you just use the 32-bit - the millidollar difference in hardware costs is worth it for the added convenience.)
I routinely use TMS320F28xxx family (Piccolo, Delfino etc.) to implement control algorithms for power electronics. I use them mostly because I'm a longtime TI fanboy and I'm comfortable with their tools, documentation and support. They still bring performance advantages for complex applications, for example a TMS320F2837xD processor can reach 800 MIPS/MFLOPS. However, these days every Cortex-M4 processor has also high clock speeds, FPU and hardware support for MAC, CORDIC etc.., and therefore they are enough for many control applications.
Basically the F28xxx "niche" is people that need a lot of room for processing power, and don't want to switch to FPGA.