Fastest way to compute floating point log and exp

I'm looking at the fastest way to compute floating point (32bit) log and exp on an FPGA (Stratix II or Virtex 4). It must be faster than a 3.6GHz Xeon which can do that in 18.5 clock cycles on average on a 1000 float vector (using SSE3)

So far I've found that with a few tables and floating point MUL and ADD blocks I should be able to do that at 160MHz on a Stratix II (not tested yet). It's still slower than the Xeon but at least it's close. I've looked at cordic at a glance, but it looks slower or bigger if totally unrolled.

Any ideas or pointers on a better way to do that?

Thanks

Marc

Reply to
Marc Battyani
Loading thread data ...

You didn't mention how much precision you need, nor for that matter what the log base is to be. Log is actually easier to compute for a floating point argument, since the significand is already presumably normalized, which reduces the complexity of computing the log since the range is limited. for fixed point, the usual approach is to normalize it first, essentially converting to floating point. Log base 2 is a little bit easier than other bases because the floating point exponent is the integer portion of the log. The fraction portion can be computed with a small (4 input) look-up to get you to 1/2 dB or so. If you need more precision, there are iterative techniques that can get you to whatever precision you need. Exponent is the reverse: you use a look-up to exponentiate the fractional portion and then use the integer portion to control how much the exponentiated fraction is shifted. This can be pipelined, and run at over 200 MS/sec in a single thread in Spartan3 FPGAs, faster in premium devices.

Do you need the log in floating point too, or is it fixed point log notation? Most importantly, what is the precision requirement?

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759
Reply to
Ray Andraka

Some time back, a company called LogPoint was offering an alternative to floating point arithmetic, based on logarithms, and depended on a fairly efficient float to log conversion. (I had discussions with them, at least 8 years ago) I seem to remember that the LOG wizard was a guy by the name of Lester Pickett.

formatting link

unfortunately, this seems to be dead:

formatting link
(as in, a place holder)

Have a look at patent 5197024 at

formatting link
for all the details.

Philip

Philip Freidin Fliptronics

Reply to
Philip Freidin

Also check out CORDIC algorithms.

Goerge

Reply to
GMM50

Marc Battyani schrieb:

Why would you want to unroll it?

Kolja Sulimma

Reply to
Kolja Sulimma

exp

looked

I need the log as an IEEE754 32 bit floating point with full precision from an input also in IEEE754 32 bit floating point format. (Maybe this will be changed to extended floating point (40 bits) So far I take the mantissa, use the first bits to select the coeffs of a small polynomial in a LUT, apply the polynomial to the mantissa, and add the exponent multiplied by a coeff. The kind of log is not very imporant as it is used in a more general computation which can be modified if needed. So using log2 is probably a good idea (no coeff for the exponent)

Marc

Reply to
Marc Battyani

exp

looked

Thanks, that's interesting but here I need to process IEEE754 32 bits floating points.

Marc

Reply to
Marc Battyani

Actually, CORDIC is not as efficient as other iterative algorithms for log, although it is a good start and does provide the benefit of arbitrary precision.

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

Kolja Sulimma wrote:

Higher data throughput. It's more efficient to unroll it than to instantiate multiple copies because the stages of the unrolled version use fixed shifts (just wiring) rather than barrel shifters.

Reply to
Eric Smith

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.