Are FP-DSP processor really doing hardware MAC instructions internally using 32 bit FP representation ? I very much doubt that.
When doing MACs in software using integer instruction, why would one use FP for MAC processing ? Use some big 32/64 bit integer/fixed point accumulator and only convert the final result to floating point for further processing.
FP add/sub are nasty, since these may require normalization of the result, in which first must be determined how many bits needs to be shifted and then shift the mantissa that amount of bits to the left. Without some hardware support (find-first-bit-set style HW instruction), this is quite time consuming and cause variable latency.
Thus doing some higher degree polynomials calculations, the intermediate results should be kept in integer/fixed point format and only round/truncate the final result to required representation.