H have recently been working on a floating point unit for a Virtex 4 SX 35. I have a floating point adder and a floating point multiplier. The adder has 6 pipeline stages and the multiplier has 3 stages.
The idea behind this project is to find out the kind of floating point performance that is possible in a modern FPGA.
Our floating point format uses up to 15 bits of mantissa (with an implicit one) and up to 10 bits of exponent. We have managed to get a complex butterfly running at up to 250 MHz. So my question now is if these numbers are reasonable or if anyone knows of a reference to a faster fpu.
We have already used some tricks to improve performance, for example by manually instantiating LUTs so that we can build an adder with a2 to 1 MUX on one of the operands using only one LUT per bit.
We have also tried to build up the design using RLOC:ed modules. This did not lead to improved performance as compared to a non RLOC:ed design. This could change once we start to fill up the device though. At the moment we are only utilizing about 20% of the FPGA.