I'm looking at different options to use FPGAs as coprocessors for algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or the Altera Stratix IV (360, 530 or 820), what would be my best option? The Xilinx Spartan 6 may also be a possibility.
There are a number of different families from each company, many of which will work will for algorithmic acceleration.
If there is a reason to use one over the other, it will likely depend on details of the problem at hand.
For many such designs, the product of the number of CLBs and the clock speed that you can run it at is most important, in addition to the cost per chip.
What algorithm do you want to implement with it? If you don't need many parallel calculation or hard realtime, usually a fast PC with a GPU and something like CUDA or OpenCL is more cost-effective, and much easier to program.
--
Frank Buss, http://www.frank-buss.de
electronics and more: http://www.youtube.com/user/frankbuss
Things to consider when choosing an FPGA for algorithm accelleration, presumably off-loading a CPU, include what types of communications/ memory interfaces would be required to support transferring required data to/frrom the CPU or system? How well does each candidate support those interfaces?
Also, what types of internal memory (multi-port, different read/write data widths, ECC protected, etc), and how much, would be required to support the algorithm? How well would each candidate support that?
I think it has been already siad that it depends on what you are doing but I will make some general comments.
Some applications like data manipulation like video data need lot's of memory so easy access to DDR memory might be a point to look for. As an example Spartan-6 has a hardened controller which is good but only if the 2/4 16bit potential interfaces offer enough bandwidth and size. Other FPGA could offer harder to implement but bigger and faster DDR2/3.
More expensive FPGAs e.g. Virtex and Stratix tend to offer more internaal SRAM and DSP blocks so this may be a reason to go this way. I weill counter that by saying an array approach like our Merrick3/4/6 boards might be a lower power, cheaper, alternative.
You might find that you will need a higher performance PCIe interface to handle your data flow into a host PC. Here the more expensive FPGAs tend to be better but there are other ways that might be worth consideration.
These are all general statements and real way to do this is to look at the system design level. If want more specific comment contact me though the Enterpoint contact page
I think this is the best suggestion so far. Some manufacturers even offer GPU cards without connectors for a monitor. The computational power of a GPU is huge! It will be very hard to beat with an FPGA.
--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
There are some dynamic programming algorithms that need many eight bit add/subtract/compares. I can fit hundreds of cells, each with about five such operations, in a Spartan 3E.
Multiply and divide are much harder in an FPGA, as is floating point, but small fixed point add/subtract is easy and fast.
A single GPU offers about 250Gflops of computational power. Maybe you don't need the floating point but even then it might be faster than a Spartan 3E doing fixed point operations. And don't forget the data has to be fetched and stored somewhere. Another piece of cake for a GPU.
--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
For a linear systolic array, it is pretty easy to get the data in, which goes at a fairly slow rate. Coming out depends on the actual data, and can be high or low. Also, a linear systolic array can be extended by adding more chips fairly easily. You do need to power and cool them, but otherwise it is a linear array of as many chips as you can afford.
The reduced development effort of GPU vs FPGA should not be understated. It's essentially software design vs. hardware design; this comes with all the advantages (development cycles, portability, and tightly-integrated/mature-ish tools are the big ones).
Last I heard, the conventional wisdom here was that GPUs can get you
10x with a few weeks of effort for most problems. FPGAs may get you
100x for some specific problems, but at 10x the development effort (even more if you're not targeting COTS).
Ultimately, they are both just means to an end, but it's usually better to let the problem dictate the solution, as opposed to the other way around.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.