Cortex M4 Floating Point Size

Are FP-DSP processor really doing hardware MAC instructions internally using 32 bit FP representation ? I very much doubt that.

When doing MACs in software using integer instruction, why would one use FP for MAC processing ? Use some big 32/64 bit integer/fixed point accumulator and only convert the final result to floating point for further processing.

FP add/sub are nasty, since these may require normalization of the result, in which first must be determined how many bits needs to be shifted and then shift the mantissa that amount of bits to the left. Without some hardware support (find-first-bit-set style HW instruction), this is quite time consuming and cause variable latency.

Thus doing some higher degree polynomials calculations, the intermediate results should be kept in integer/fixed point format and only round/truncate the final result to required representation.

Reply to
upsidedown
Loading thread data ...

Don't know about specialized FP DSP-s, never used one. I have been doing a lot of DSP-ing on a power (PPC) FPU (mostly on an MPC5200B). It has 32 64-bit FPU regs and can do MAC at both 32 and 64 bits precision. It takes 1 cycle/32 bit MAC and 2 cycles per 64 bit/MAC. Reaching that is not straight forward as on a DSP though, there are data dependencies to take into account. OTOH, having 32 registers can save a lot of load and store during the filter loop, I managed the 2 cycles/MAC in a loop at about 10% load/store etc. overhead penalty. Here is how I did it (VPA macros, self explanatory enough though):

formatting link

Without going through that instead of 5nS/MAC I was getting 30nS/MAC in a plain loop, to be expected really as the pipeline is 6 stages IIRC.

Well of course, the thing with "normal" 32 bit processors is that they do not have 64 bit accumulators and 32 bits is nowhere near sufficient. 64 bit FP, OTOH, is quite handy. Especially on the power architecture FPU, where one can read 32 bit FP data and have these expanded to 64 bits in a single cycle.

Last time I had the fun doing this was on a CPU32 (on the 68340), quite a while ago :-). But the hardware FPU-s on the power architecture processors are really good at this, somehow they manage add/sub/mul within a single cycle.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
dp

I know you don't really need the details but:

Most 32-bit FPUs use 8 bits for the signed exponent, one bit for the sign, and this leaves only 23 bits for the mantissa. Not 25. (There is also the hidden bit, of course.)

Just being pedantic.

Jon

Reply to
Jon Kirwan

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.