Best FPGA for algorithmic acceleration

Hello,

I'm looking at different options to use FPGAs as coprocessors for algorithmic acceleration. Between the Xilinx Virtex 6 (LXT or SXT), or the Altera Stratix IV (360, 530 or 820), what would be my best option? The Xilinx Spartan 6 may also be a possibility.

Thanks, Jordan

Reply to
Jordan Fix
Loading thread data ...

There are a number of different families from each company, many of which will work will for algorithmic acceleration.

If there is a reason to use one over the other, it will likely depend on details of the problem at hand.

For many such designs, the product of the number of CLBs and the clock speed that you can run it at is most important, in addition to the cost per chip.

-- glen

Reply to
glen herrmannsfeldt

What algorithm do you want to implement with it? If you don't need many parallel calculation or hard realtime, usually a fast PC with a GPU and something like CUDA or OpenCL is more cost-effective, and much easier to program.

--
Frank Buss, http://www.frank-buss.de
electronics and more: http://www.youtube.com/user/frankbuss
Reply to
Frank Buss

Things to consider when choosing an FPGA for algorithm accelleration, presumably off-loading a CPU, include what types of communications/ memory interfaces would be required to support transferring required data to/frrom the CPU or system? How well does each candidate support those interfaces?

Also, what types of internal memory (multi-port, different read/write data widths, ECC protected, etc), and how much, would be required to support the algorithm? How well would each candidate support that?

Andy

Reply to
Andy

I think it has been already siad that it depends on what you are doing but I will make some general comments.

Some applications like data manipulation like video data need lot's of memory so easy access to DDR memory might be a point to look for. As an example Spartan-6 has a hardened controller which is good but only if the 2/4 16bit potential interfaces offer enough bandwidth and size. Other FPGA could offer harder to implement but bigger and faster DDR2/3.

More expensive FPGAs e.g. Virtex and Stratix tend to offer more internaal SRAM and DSP blocks so this may be a reason to go this way. I weill counter that by saying an array approach like our Merrick3/4/6 boards might be a lower power, cheaper, alternative.

You might find that you will need a higher performance PCIe interface to handle your data flow into a host PC. Here the more expensive FPGAs tend to be better but there are other ways that might be worth consideration.

These are all general statements and real way to do this is to look at the system design level. If want more specific comment contact me though the Enterpoint contact page

formatting link
and I will happy to discuss this in more detail.

John Adair Enterpo> Hello,

Reply to
John Adair

I think this is the best suggestion so far. Some manufacturers even offer GPU cards without connectors for a monitor. The computational power of a GPU is huge! It will be very hard to beat with an FPGA.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
 Click to see the full signature
Reply to
Nico Coesel

(snip, someone wrote)

In some cases, it is easy to beat with an FPGA.

There are some dynamic programming algorithms that need many eight bit add/subtract/compares. I can fit hundreds of cells, each with about five such operations, in a Spartan 3E.

Multiply and divide are much harder in an FPGA, as is floating point, but small fixed point add/subtract is easy and fast.

-- glen

Reply to
glen herrmannsfeldt

A single GPU offers about 250Gflops of computational power. Maybe you don't need the floating point but even then it might be faster than a Spartan 3E doing fixed point operations. And don't forget the data has to be fetched and stored somewhere. Another piece of cake for a GPU.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
 Click to see the full signature
Reply to
Nico Coesel

(snip, I wrote)

For a linear systolic array, it is pretty easy to get the data in, which goes at a fairly slow rate. Coming out depends on the actual data, and can be high or low. Also, a linear systolic array can be extended by adding more chips fairly easily. You do need to power and cool them, but otherwise it is a linear array of as many chips as you can afford.

-- glen

Reply to
glen herrmannsfeldt

The reduced development effort of GPU vs FPGA should not be understated. It's essentially software design vs. hardware design; this comes with all the advantages (development cycles, portability, and tightly-integrated/mature-ish tools are the big ones).

Last I heard, the conventional wisdom here was that GPUs can get you

10x with a few weeks of effort for most problems. FPGAs may get you 100x for some specific problems, but at 10x the development effort (even more if you're not targeting COTS).

Ultimately, they are both just means to an end, but it's usually better to let the problem dictate the solution, as opposed to the other way around.

Reply to
MBodnar

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.