Any idea roughly what clock frequency I would be looking at in order to have an ARM Cortex M4 do IIR filter calculations? I need maybe 3E6 multiple/accumulates per second, so maybe capable of 10E6 per second with 100% usage.
The M4 core offers a 32 bit multiply/64 bit accumulate in 1 cycle...
This is to run four decimation filters in parallel. I don't want to use an FPGA.
You have given the performance numbers. What are you asking? Are you just looking for confirmation?
If you want 10 MMACS with 1 clock per, a 100 MHz device should give you plenty of head room. They are not uncommon at this speed. You might want to run the code out of memory, but since this is likely a tight loop I expect cache performance would suffice. What else does this device need to do at the same time?
Other than coding up a filter and running it, I can't think of a way to estimate performance beyond this. I know at a seminar once when the CM4 was just out they were touting the DSC (Digital Signal Controller) aspects of the design. They said it would approach 1 MMACS/MHz including the memory accesses. I don't know all the assumptions they made to get there.
Why don't you ask this in the DSP group? I expect someone there has done something similar. Also, I think Joerge did a design using the CM4 for filtering not too long ago.
The classic approach is to combine the filter and decimation in a CIC (Cascaded Integrator-Comb) structure, so you do not need any multiplications at all.
If the decimation is done in steps, the computing power requirement gets still smaller.
For a M4, my guess is that you're going to use more of the CPU time hauling data points and coefficients than for the actual MAC's.
I recently asked an optimization question on comp.arch.embedded and someone mentioned in passing that to get full performance from the M4 hardware you need to use lots of registers, because of pipelining. That was for plain old MAC-ing, not for IIR filters that are going to challenge the optimizer by needing the filter states from step n-1 to calculate the filter states for step n.
Particularly if your filter is of low order you may find the most joy with (a) hand coding the thing, (b) doing batches of input samples if you can (to avoid lots of context switching), and (c) running more than one filter in parallel (to take advantage of the pipeline). If you're really lucky the compiler will understand the MAC process and do the optimization for you.
I've got several M4 boards laying around, email me if you want me to do some prototyping and measuring for you.
I remember that conversation and the guy saying you needed to use CPU registers to do pipelining was full of s***. Pipelining would be handled internally to the CPU if needed. I seem to recall the guy refused to look at a TRM for the ARM.
The link I posted listed some good info on the whole thing including straight C code, "optimized" C code and hand coded assembly language. I believe the IIR filter didn't gain a lot by assembly while the IIR filter did, but don't take my word for it. My memory sucks these days.
M7's are not very common yet - there are only a few that are actually available. But the core handles twice the instructions per clock, IIRC, and can run at a faster clock - it will definitely give a lot more headroom for operations like this.
Perhaps not everyone is selling them yet but they are available (I've been using one for a year). Yes, they're much faster and more efficient when used for DSP applications. The cost is right there, too.
I think the first thing I'd do is just code the damned thing up in C with a gnu compiler, set it up to load into fast RAM (the ST parts, at least, have a section of RAM that has fast instruction-bus access), and benchmark.
IIR filter code seems to benefit greatly from being written for the purpose. And I generally don't trust code that comes from processor company apps engineers, having been burned by it before.
Reuse of code is not uncommon. I have had a lot of practice working with other's code. If you understand what it is doing I am comfortable with working with code supplied by a vendor... well, as I said, if I understand it. I remember trying to read Microsoft's idea of a serial port interface for Windows 3.1, iirc. It seemed to be a rather bizarre arrangement of back and forth for no apparent reason. I never tried to change any of it. I think we needed to control the enable for an RS-485 driver.
But for IIR code, it is simple enough to read and understand, at least as a starting point.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.