Division Algorithms

G

gnuarm.deletethisbit 5 years ago

I am looking for an algorithm to calculate a floating point divide. There a re a number of options, but the one that is most clear to me and easiest to implement on the hardware I am designing seems to be the Newton?Ra phson iterative method. I'm trying to understand how many iterations it wil l take. The formula for that in the Wikipedia article assumes an initial es timate for the reciprocal of 48/17 - D * 32/17. This would require 2 iterat ions to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit floating p oint numbers. My needs are for something between that range, but would like to minimize the number of iterations... ideally one.

I believe most implementations use a table lookup for the seed value of the reciprocal of the divisor. In the FPGA I am using, a convenient size would be 2k entries (11 bit address) of 18 bits. I'm wondering if this would be sufficiently more accurate than the above estimate so that one iteration wo uld be sufficient. I'm not clear on how to calculate this. I know in genera l the formula converges rapidly with the error being squared, so I'm thinki ng an initial value that is good to some 11 bits would produce more than 20 bits for one turn of the Newton-Raphson crank (20 bits being a level of ac curacy called "good enough" for this application). But I'd like to have som ething that verifies this rather than using a rule of thumb. At some point I will probably have to justify the correctness of the calculations.

Because of the hardware facilities available the calculations have intermed iate mantissas of 18 or 36 bits with 28 bits stored when in memory/stack ot her than ToS which is the register in the ALU (>36 bits). I don't think 11 bits is quite as good as required, but I'm thinking one crank of the NR mac hine and I'm "good". However, I need to be able to show how good. Anyone k now of good documentation on this matter?

Oh, I guess I should mention later on I might be repeating this process for a square root calculation. A measurement using a sensor that is at least t emporarily deprecated requires a square root. I managed to show the sensor was not capable of providing an accuracy at the low end that would produce meaningful results. But if they find a better sensor they may want to add t hat sensor as an option and the square root will be back. No! The HORROR!

Whatever... It's just an algorithm...

Rick C. - Get 1,000 miles of free Supercharging - Tesla referral code - https://ts.la/richard11209

Vote

K

Kevin Neilson 5 years ago

are a number of options, but the one that is most clear to me and easiest to implement on the hardware I am designing seems to be the Newton? Raphson iterative method. I'm trying to understand how many iterations it w ill take. The formula for that in the Wikipedia article assumes an initial estimate for the reciprocal of 48/17 - D * 32/17. This would require 2 iter ations to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit floating point numbers. My needs are for something between that range, but would li ke to minimize the number of iterations... ideally one.

he reciprocal of the divisor. In the FPGA I am using, a convenient size wou ld be 2k entries (11 bit address) of 18 bits. I'm wondering if this would b e sufficiently more accurate than the above estimate so that one iteration would be sufficient. I'm not clear on how to calculate this. I know in gene ral the formula converges rapidly with the error being squared, so I'm thin king an initial value that is good to some 11 bits would produce more than

20 bits for one turn of the Newton-Raphson crank (20 bits being a level of accuracy called "good enough" for this application). But I'd like to have s omething that verifies this rather than using a rule of thumb. At some poin t I will probably have to justify the correctness of the calculations.

ediate mantissas of 18 or 36 bits with 28 bits stored when in memory/stack other than ToS which is the register in the ALU (>36 bits). I don't think 1

1 bits is quite as good as required, but I'm thinking one crank of the NR m achine and I'm "good". However, I need to be able to show how good. Anyone know of good documentation on this matter?

or a square root calculation. A measurement using a sensor that is at least temporarily deprecated requires a square root. I managed to show the senso r was not capable of providing an accuracy at the low end that would produc e meaningful results. But if they find a better sensor they may want to add that sensor as an option and the square root will be back. No! The HORROR!

You don't mention throughput or latency requirements, just that you want so mething non-iterative. A resource I've used before is the Altera Advanced Synthesis Cookbook. You can find it online and it has algorithms/source fo r both "approximate" floating-point divider and root modules. These both s acrifice a bit of precision for low gate count. The divider is pipelined a nd has a latency of about 5 cycles. Maybe it could be helpful.

Vote

G

gnuarm.deletethisbit 5 years ago

re are a number of options, but the one that is most clear to me and easies t to implement on the hardware I am designing seems to be the Newton? ?Raphson iterative method. I'm trying to understand how many iterations i t will take. The formula for that in the Wikipedia article assumes an initi al estimate for the reciprocal of 48/17 - D * 32/17. This would require 2 i terations to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit float ing point numbers. My needs are for something between that range, but would like to minimize the number of iterations... ideally one.

the reciprocal of the divisor. In the FPGA I am using, a convenient size w ould be 2k entries (11 bit address) of 18 bits. I'm wondering if this would be sufficiently more accurate than the above estimate so that one iteratio n would be sufficient. I'm not clear on how to calculate this. I know in ge neral the formula converges rapidly with the error being squared, so I'm th inking an initial value that is good to some 11 bits would produce more tha n 20 bits for one turn of the Newton-Raphson crank (20 bits being a level o f accuracy called "good enough" for this application). But I'd like to have something that verifies this rather than using a rule of thumb. At some po int I will probably have to justify the correctness of the calculations.

rmediate mantissas of 18 or 36 bits with 28 bits stored when in memory/stac k other than ToS which is the register in the ALU (>36 bits). I don't think 11 bits is quite as good as required, but I'm thinking one crank of the NR machine and I'm "good". However, I need to be able to show how good. Anyon e know of good documentation on this matter?

for a square root calculation. A measurement using a sensor that is at lea st temporarily deprecated requires a square root. I managed to show the sen sor was not capable of providing an accuracy at the low end that would prod uce meaningful results. But if they find a better sensor they may want to a dd that sensor as an option and the square root will be back. No! The HORRO R!

something non-iterative. A resource I've used before is the Altera Advanced Synthesis Cookbook. You can find it online and it has algorithms/source fo r both "approximate" floating-point divider and root modules. These both sa crifice a bit of precision for low gate count. The divider is pipelined and has a latency of about 5 cycles. Maybe it could be helpful.

I'm not looking for pipelined solutions because speed is not an issue. I d on't want to bother with shift and subtract not because it is slow, but bec ause each step in this design would be controlled by a ROM and while I don' t expect to worry about the ROM size, if it uses 18 locations for each divi de in the process I might need to focus on that.

The floating point NR approach seems to work ok. After considering the loo k up table in detail it seems like it will work fine with one pass of the N R algorithm, about 5 steps in all. Not bad.

Rick C. + Get 1,000 miles of free Supercharging + Tesla referral code - https://ts.la/richard11209

Vote

Division Algorithms

Join the Discussion

Didn't find your answer?