Do you have a question? Post it now! No Registration Necessary
- Subject
- Posted on
- Division Algorithms
- 01-16-2021
- gnuarm.del...
January 16, 2021, 9:31 pm
I am looking for an algorithm to calculate a floating point divide. There a
re a number of options, but the one that is most clear to me and easiest to
implement on the hardware I am designing seems to be the Newton?Ra
phson iterative method. I'm trying to understand how many iterations it wil
l take. The formula for that in the Wikipedia article assumes an initial es
timate for the reciprocal of 48/17 - D * 32/17. This would require 2 iterat
ions to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit floating p
oint numbers. My needs are for something between that range, but would like
to minimize the number of iterations... ideally one.
I believe most implementations use a table lookup for the seed value of the
reciprocal of the divisor. In the FPGA I am using, a convenient size would
be 2k entries (11 bit address) of 18 bits. I'm wondering if this would be
sufficiently more accurate than the above estimate so that one iteration wo
uld be sufficient. I'm not clear on how to calculate this. I know in genera
l the formula converges rapidly with the error being squared, so I'm thinki
ng an initial value that is good to some 11 bits would produce more than 20
bits for one turn of the Newton-Raphson crank (20 bits being a level of ac
curacy called "good enough" for this application). But I'd like to have som
ething that verifies this rather than using a rule of thumb. At some point
I will probably have to justify the correctness of the calculations.
Because of the hardware facilities available the calculations have intermed
iate mantissas of 18 or 36 bits with 28 bits stored when in memory/stack ot
her than ToS which is the register in the ALU (>36 bits). I don't think 11
bits is quite as good as required, but I'm thinking one crank of the NR mac
hine and I'm "good". However, I need to be able to show how good. Anyone k
now of good documentation on this matter?
Oh, I guess I should mention later on I might be repeating this process for
a square root calculation. A measurement using a sensor that is at least t
emporarily deprecated requires a square root. I managed to show the sensor
was not capable of providing an accuracy at the low end that would produce
meaningful results. But if they find a better sensor they may want to add t
hat sensor as an option and the square root will be back. No! The HORROR!
Whatever... It's just an algorithm...
re a number of options, but the one that is most clear to me and easiest to
implement on the hardware I am designing seems to be the Newton?Ra
phson iterative method. I'm trying to understand how many iterations it wil
l take. The formula for that in the Wikipedia article assumes an initial es
timate for the reciprocal of 48/17 - D * 32/17. This would require 2 iterat
ions to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit floating p
oint numbers. My needs are for something between that range, but would like
to minimize the number of iterations... ideally one.
I believe most implementations use a table lookup for the seed value of the
reciprocal of the divisor. In the FPGA I am using, a convenient size would
be 2k entries (11 bit address) of 18 bits. I'm wondering if this would be
sufficiently more accurate than the above estimate so that one iteration wo
uld be sufficient. I'm not clear on how to calculate this. I know in genera
l the formula converges rapidly with the error being squared, so I'm thinki
ng an initial value that is good to some 11 bits would produce more than 20
bits for one turn of the Newton-Raphson crank (20 bits being a level of ac
curacy called "good enough" for this application). But I'd like to have som
ething that verifies this rather than using a rule of thumb. At some point
I will probably have to justify the correctness of the calculations.
Because of the hardware facilities available the calculations have intermed
iate mantissas of 18 or 36 bits with 28 bits stored when in memory/stack ot
her than ToS which is the register in the ALU (>36 bits). I don't think 11
bits is quite as good as required, but I'm thinking one crank of the NR mac
hine and I'm "good". However, I need to be able to show how good. Anyone k
now of good documentation on this matter?
Oh, I guess I should mention later on I might be repeating this process for
a square root calculation. A measurement using a sensor that is at least t
emporarily deprecated requires a square root. I managed to show the sensor
was not capable of providing an accuracy at the low end that would produce
meaningful results. But if they find a better sensor they may want to add t
hat sensor as an option and the square root will be back. No! The HORROR!
Whatever... It's just an algorithm...
--
Rick C.
- Get 1,000 miles of free Supercharging
Rick C.
- Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Division Algorithms
wrote:
are a number of options, but the one that is most clear to me and easiest
to implement on the hardware I am designing seems to be the Newton?
Raphson iterative method. I'm trying to understand how many iterations it w
ill take. The formula for that in the Wikipedia article assumes an initial
estimate for the reciprocal of 48/17 - D * 32/17. This would require 2 iter
ations to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit floating
point numbers. My needs are for something between that range, but would li
ke to minimize the number of iterations... ideally one.
he reciprocal of the divisor. In the FPGA I am using, a convenient size wou
ld be 2k entries (11 bit address) of 18 bits. I'm wondering if this would b
e sufficiently more accurate than the above estimate so that one iteration
would be sufficient. I'm not clear on how to calculate this. I know in gene
ral the formula converges rapidly with the error being squared, so I'm thin
king an initial value that is good to some 11 bits would produce more than
20 bits for one turn of the Newton-Raphson crank (20 bits being a level of
accuracy called "good enough" for this application). But I'd like to have s
omething that verifies this rather than using a rule of thumb. At some poin
t I will probably have to justify the correctness of the calculations.
ediate mantissas of 18 or 36 bits with 28 bits stored when in memory/stack
other than ToS which is the register in the ALU (>36 bits). I don't think 1
1 bits is quite as good as required, but I'm thinking one crank of the NR m
achine and I'm "good". However, I need to be able to show how good. Anyone
know of good documentation on this matter?
or a square root calculation. A measurement using a sensor that is at least
temporarily deprecated requires a square root. I managed to show the senso
r was not capable of providing an accuracy at the low end that would produc
e meaningful results. But if they find a better sensor they may want to add
that sensor as an option and the square root will be back. No! The HORROR!
You don't mention throughput or latency requirements, just that you want so
mething non-iterative. A resource I've used before is the Altera Advanced
Synthesis Cookbook. You can find it online and it has algorithms/source fo
r both "approximate" floating-point divider and root modules. These both s
acrifice a bit of precision for low gate count. The divider is pipelined a
nd has a latency of about 5 cycles. Maybe it could be helpful.
Re: Division Algorithms
On Monday, January 18, 2021 at 8:13:02 PM UTC-5, Kevin Neilson wrote:
m wrote:
re are a number of options, but the one that is most clear to me and easies
t to implement on the hardware I am designing seems to be the Newton?
?Raphson iterative method. I'm trying to understand how many iterations i
t will take. The formula for that in the Wikipedia article assumes an initi
al estimate for the reciprocal of 48/17 - D * 32/17. This would require 2 i
terations to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit float
ing point numbers. My needs are for something between that range, but would
like to minimize the number of iterations... ideally one.
the reciprocal of the divisor. In the FPGA I am using, a convenient size w
ould be 2k entries (11 bit address) of 18 bits. I'm wondering if this would
be sufficiently more accurate than the above estimate so that one iteratio
n would be sufficient. I'm not clear on how to calculate this. I know in ge
neral the formula converges rapidly with the error being squared, so I'm th
inking an initial value that is good to some 11 bits would produce more tha
n 20 bits for one turn of the Newton-Raphson crank (20 bits being a level o
f accuracy called "good enough" for this application). But I'd like to have
something that verifies this rather than using a rule of thumb. At some po
int I will probably have to justify the correctness of the calculations.
rmediate mantissas of 18 or 36 bits with 28 bits stored when in memory/stac
k other than ToS which is the register in the ALU (>36 bits). I don't think
11 bits is quite as good as required, but I'm thinking one crank of the NR
machine and I'm "good". However, I need to be able to show how good. Anyon
e know of good documentation on this matter?
for a square root calculation. A measurement using a sensor that is at lea
st temporarily deprecated requires a square root. I managed to show the sen
sor was not capable of providing an accuracy at the low end that would prod
uce meaningful results. But if they find a better sensor they may want to a
dd that sensor as an option and the square root will be back. No! The HORRO
R!
something non-iterative. A resource I've used before is the Altera Advanced
Synthesis Cookbook. You can find it online and it has algorithms/source fo
r both "approximate" floating-point divider and root modules. These both sa
crifice a bit of precision for low gate count. The divider is pipelined and
has a latency of about 5 cycles. Maybe it could be helpful.
I'm not looking for pipelined solutions because speed is not an issue. I d
on't want to bother with shift and subtract not because it is slow, but bec
ause each step in this design would be controlled by a ROM and while I don'
t expect to worry about the ROM size, if it uses 18 locations for each divi
de in the process I might need to focus on that.
The floating point NR approach seems to work ok. After considering the loo
k up table in detail it seems like it will work fine with one pass of the N
R algorithm, about 5 steps in all. Not bad.
m wrote:
re are a number of options, but the one that is most clear to me and easies
t to implement on the hardware I am designing seems to be the Newton?
?Raphson iterative method. I'm trying to understand how many iterations i
t will take. The formula for that in the Wikipedia article assumes an initi
al estimate for the reciprocal of 48/17 - D * 32/17. This would require 2 i
terations to gain 16 bits of accuracy or 3 iterations for IEEE 32 bit float
ing point numbers. My needs are for something between that range, but would
like to minimize the number of iterations... ideally one.
the reciprocal of the divisor. In the FPGA I am using, a convenient size w
ould be 2k entries (11 bit address) of 18 bits. I'm wondering if this would
be sufficiently more accurate than the above estimate so that one iteratio
n would be sufficient. I'm not clear on how to calculate this. I know in ge
neral the formula converges rapidly with the error being squared, so I'm th
inking an initial value that is good to some 11 bits would produce more tha
n 20 bits for one turn of the Newton-Raphson crank (20 bits being a level o
f accuracy called "good enough" for this application). But I'd like to have
something that verifies this rather than using a rule of thumb. At some po
int I will probably have to justify the correctness of the calculations.
rmediate mantissas of 18 or 36 bits with 28 bits stored when in memory/stac
k other than ToS which is the register in the ALU (>36 bits). I don't think
11 bits is quite as good as required, but I'm thinking one crank of the NR
machine and I'm "good". However, I need to be able to show how good. Anyon
e know of good documentation on this matter?
for a square root calculation. A measurement using a sensor that is at lea
st temporarily deprecated requires a square root. I managed to show the sen
sor was not capable of providing an accuracy at the low end that would prod
uce meaningful results. But if they find a better sensor they may want to a
dd that sensor as an option and the square root will be back. No! The HORRO
R!
something non-iterative. A resource I've used before is the Altera Advanced
Synthesis Cookbook. You can find it online and it has algorithms/source fo
r both "approximate" floating-point divider and root modules. These both sa
crifice a bit of precision for low gate count. The divider is pipelined and
has a latency of about 5 cycles. Maybe it could be helpful.
I'm not looking for pipelined solutions because speed is not an issue. I d
on't want to bother with shift and subtract not because it is slow, but bec
ause each step in this design would be controlled by a ROM and while I don'
t expect to worry about the ROM size, if it uses 18 locations for each divi
de in the process I might need to focus on that.
The floating point NR approach seems to work ok. After considering the loo
k up table in detail it seems like it will work fine with one pass of the N
R algorithm, about 5 steps in all. Not bad.
--
Rick C.
+ Get 1,000 miles of free Supercharging
Rick C.
+ Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Site Timeline
- » Communist Chinese Military Companies
- — Next thread in » Field-Programmable Gate Arrays
- » Achronix Semiconductor in Talks for Merger
- — Previous thread in » Field-Programmable Gate Arrays
- » Fully Comitted to LVDS as Comparitors
- — Newest thread in » Field-Programmable Gate Arrays
- » TLV431 obudowa przewlekana TO-92, poszukiwane
- — The site's Newest Thread. Posted in » Electronics (Polish)
- » RecherchÃ© : SchÃ©ma du circuit du Metrix MX 528
- — The site's Last Updated Thread. Posted in » Electronics (French)