I writing VHDL for a Xilinx Virtex-II Pro FPGA application, and software for its embedded PowerPC. The PowerPC doesn't have hardware floating point, and floating point emulation in software is too slow for my purpose. So, I'm looking for a way to add floating point in the FPGA fabric through VHDL or black boxes. I need IEEE floating point operations such as add, subtract, multiply, divide. I also need some math functions like sine, cosine, square root, etc.
Is there a (free) standard package in VHDL that I can use for this? Third party support?
I'm a newbie on a budget, so I'm looking for the cheapest, quickest solution.
If I could get 100 nanosecond performance for a floating point multiply or divide, I think that would be good enough. The math that my application needs to perform isn't quite nailed down yet, but it involves signal processing that will need complex multiplication, complex division, sine, cosine, square root, etc. The performance goal of the system is to make measurements at a 100 MHz rate, but it can be slower if cost becomes prohibitive.
The PowerPC is rated for about 300 MHz, but the development board I am currently using is running at only 100 MHz, and it takes a single cycle (10 nanoseconds) for an integer operation. At 100 MHz, an emulated floating point addition operation takes about 48 microseconds, which is way too slow to meet my performance specs.
Are there any high performance "floating point coprocessors" out there like were used. The production quantity for our system will be very low, and cannot justify a large expense for third-party IP, so we are willing to consider an extra chip or two to do special work. For example, rather than invest in Ethernet IP core for our FPGA, we mated it to a commercially-available Ethernet MAC/PHY chip, and saved a lot on IP costs, NRE, and FPGA internal real-estate.
In many applications, it doesn't make much sense to carry 32+ bits of mantissa when the original data was digitized by a 10 - 12 Bit ADC. When you have a processor with that word-width built in, it makes no difference. But, when you are allocating resources on an FPGA, it may make a huge difference in real estate consumed by the functions. As for Sine, Cos, etc. you may be able to use a look-up table for these. A 100 nS multiply is doable, but a 100 nS sine or sqrt may be a lot tougher. If the resolution is not too high, then the LUT is both fast and can be done with a fast ram or rom.
You might want to look at all the DSP cores and other materials and resources that Xilinx has available.