I've finally been considering a project to use either a Cortex-M3 or a PIC32 processor and I've a technical question unrelated to any "business issues" between these options -- the divide instruction operation. Both of these cores include one but I'm interested in any remarkable technical details between them, including cycle counts but not limited to that (load-store time is fair game.)
From what I've been able to garner from skimming the docs, the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
2 to 12 clock cycles, but with a comment suggesting that it takes less time when the operand sizes are similar. Which doesn't tell me what the typical time may be. Also, it's been a bit of a pain searching for good assembler docs on the Cortex-M3. But I've only been at it for about an hour or so, so it's likely I am just slow and ignorant -- not that there aren't good caches out there I should have found.On the PIC32, the docs are clearer. It's "one bit per clock" and it includes an "early detection" of sign/zero bits in the upper bytes to help goose that along where 7, 15, or 23 bits worth might be skipped. Worst case, it says, is 35 clocks. It also stalls the 5-stage pipe if another division is issued before the earlier one completes.
I am wondering if anyone has had direct experience playing with either of these in the area of writing floating point libraries and has had a chance to compare their relative utility for that purpose and might comment on any relatively significant details related to that effort -- speed being the main question here.
At first blush, I'd say