ARM Cortex M4 filter calculations

Hi,

Any idea roughly what clock frequency I would be looking at in order to have an ARM Cortex M4 do IIR filter calculations? I need maybe 3E6 multiple/accumulates per second, so maybe capable of 10E6 per second with 100% usage.

The M4 core offers a 32 bit multiply/64 bit accumulate in 1 cycle...

This is to run four decimation filters in parallel. I don't want to use an FPGA.

--sp

--
Best regards,  
Spehro Pefhany 
 Click to see the full signature
Reply to
Spehro Pefhany
Loading thread data ...

Den onsdag den 16. december 2015 kl. 19.23.00 UTC+1 skrev Spehro Pefhany:

filters are part of the CMSIS DSP library so there must be some performance numbers somewhere.

why IIR for decimation?

-Lasse

Reply to
Lasse Langwadt Christensen

You have given the performance numbers. What are you asking? Are you just looking for confirmation?

If you want 10 MMACS with 1 clock per, a 100 MHz device should give you plenty of head room. They are not uncommon at this speed. You might want to run the code out of memory, but since this is likely a tight loop I expect cache performance would suffice. What else does this device need to do at the same time?

Other than coding up a filter and running it, I can't think of a way to estimate performance beyond this. I know at a seminar once when the CM4 was just out they were touting the DSC (Digital Signal Controller) aspects of the design. They said it would approach 1 MMACS/MHz including the memory accesses. I don't know all the assumptions they made to get there.

--

Rick
Reply to
rickman

Why don't you ask this in the DSP group? I expect someone there has done something similar. Also, I think Joerge did a design using the CM4 for filtering not too long ago.

--

Rick
Reply to
rickman

The classic approach is to combine the filter and decimation in a CIC (Cascaded Integrator-Comb) structure, so you do not need any multiplications at all.

If the decimation is done in steps, the computing power requirement gets still smaller.

For a M4, my guess is that you're going to use more of the CPU time hauling data points and coefficients than for the actual MAC's.

--

-TV
Reply to
Tauno Voipio

I did just a little digging and found this paper with examples.

formatting link

--

Rick
Reply to
rickman

< snip >

That's what I'd do.

I recently asked an optimization question on comp.arch.embedded and someone mentioned in passing that to get full performance from the M4 hardware you need to use lots of registers, because of pipelining. That was for plain old MAC-ing, not for IIR filters that are going to challenge the optimizer by needing the filter states from step n-1 to calculate the filter states for step n.

Particularly if your filter is of low order you may find the most joy with (a) hand coding the thing, (b) doing batches of input samples if you can (to avoid lots of context switching), and (c) running more than one filter in parallel (to take advantage of the pipeline). If you're really lucky the compiler will understand the MAC process and do the optimization for you.

I've got several M4 boards laying around, email me if you want me to do some prototyping and measuring for you.

--
Tim Wescott 
Wescott Design Services 
 Click to see the full signature
Reply to
Tim Wescott

I remember that conversation and the guy saying you needed to use CPU registers to do pipelining was full of s***. Pipelining would be handled internally to the CPU if needed. I seem to recall the guy refused to look at a TRM for the ARM.

The link I posted listed some good info on the whole thing including straight C code, "optimized" C code and hand coded assembly language. I believe the IIR filter didn't gain a lot by assembly while the IIR filter did, but don't take my word for it. My memory sucks these days.

--

Rick
Reply to
rickman

Have you looked at an M7, instead? They're about the same price and come in the same sorts of packages.

Reply to
krw

thought I'd cross-post this to get some cross-pollination.

--

Rick
Reply to
rickman

M7's are not very common yet - there are only a few that are actually available. But the core handles twice the instructions per clock, IIRC, and can run at a faster clock - it will definitely give a lot more headroom for operations like this.

Reply to
David Brown

On Thu, 17 Dec 2015 14:07:03 +0100, David Brown Gave us:

The developer's dream platform (for the price).

formatting link

Reply to
DecadentLinuxUserNumeroUno

On Thu, 17 Dec 2015 08:32:59 -0500, DecadentLinuxUserNumeroUno Gave us:

Don't like armada? Try freescale

formatting link

Reply to
DecadentLinuxUserNumeroUno

Perhaps not everyone is selling them yet but they are available (I've been using one for a year). Yes, they're much faster and more efficient when used for DSP applications. The cost is right there, too.

Reply to
krw

Den torsdag den 17. december 2015 kl. 14.36.23 UTC+1 skrev DecadentLinuxUserNumeroUno:

how are those monster A9 in anyway relevant to a single micro like a cortex M4/M7 ?

-Lasse

Reply to
Lasse Langwadt Christensen

I think the first thing I'd do is just code the damned thing up in C with a gnu compiler, set it up to load into fast RAM (the ST parts, at least, have a section of RAM that has fast instruction-bus access), and benchmark.

Then I'd go from there.

--
Tim Wescott 
Wescott Design Services 
 Click to see the full signature
Reply to
Tim Wescott

Looks pretty decent, thanks. - offerings from Atmel and ST, and not too needy a chip- non-BGA packages availalable etc.

The $36 eval board from Atmel is nice too, not loaded up with all kinds of crap- just an Ethernet port, USB and headers.

--sp

--
Best regards,  
Spehro Pefhany 
 Click to see the full signature
Reply to
Spehro Pefhany

Why reinvent the wheel if you can start with the work of others?

--

Rick
Reply to
rickman

IIR filter code seems to benefit greatly from being written for the purpose. And I generally don't trust code that comes from processor company apps engineers, having been burned by it before.

--
www.wescottdesign.com
Reply to
Tim Wescott

Reuse of code is not uncommon. I have had a lot of practice working with other's code. If you understand what it is doing I am comfortable with working with code supplied by a vendor... well, as I said, if I understand it. I remember trying to read Microsoft's idea of a serial port interface for Windows 3.1, iirc. It seemed to be a rather bizarre arrangement of back and forth for no apparent reason. I never tried to change any of it. I think we needed to control the enable for an RS-485 driver.

But for IIR code, it is simple enough to read and understand, at least as a starting point.

--

Rick
Reply to
rickman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.