FPGA acceleration v.s. GPU acceleration

I was an FPGA engineer before and I think high performance computing based FPGA will lead to a bright future. However through my recently projects I found GPU will be more appropriate when there is a acceleration need.

In embedded system, FPGA co-processing plan: Intel E6x5C

and GPU co-processing plan: AMD APU (with opencl support)

and in desktop system, FPGA co-processing plan: Full custom design, mostly will be based on PCIe fabric

and GPU co-processing plan: nVidia CUDA (with opencv basically support)

If I choose FPGA co-processing, the algorithm will be specifically optimized and R&D time will be very noticeable. If I choose GPU plan, algorithm migration will cost little time(even the original one is Matlab code), and the acceleration performance will also be quite well.

As a conclusion, the FPGA acceleration only suits some certain and fixed application. However in the real world , many projects and many algorithms are very uncertain and arbitrary. With same power consumption, GPU plan may lead better results. For a concrete project, I will consider GPU or DSP, and FPGA at last.

Do everybody agree?

Reply to
vcar
Loading thread data ...

If what you need is a computation off-load engine for a standard CPU, with that CPU handling all the I/O tasks, than using a GPU would probably be the most appropriate implementation methodology.

However, the phrase "horses for courses" always applies.

--------------------------------------- Posted through

formatting link

Reply to
RCIngham

I like the way you are looking at this issue, that is, from a cost efficiency point-of-view.

Again, here it boils down to cost efficiency and how you choose to measure it. Can we devote the time to optimize the algorithm for the FPGA? Can we afford to make it massively parallel to reduce latency? Does it even matter if the latency is worse than the GPU? Is the cost performance ratio better than that of the GPU? And so on...

The way you use "suit" and "certain" is extremely subjective. If as the system architect, I see that though my system is not optimized for latency, for example, it may still be acceptable to use depending on system requirements and cost efficiency. Rarely does one get a system spec that states "Run as fast as possible."

Cost efficiency rules here. How can you measure it? Power, latency, area, throughput, NRE, etc. Simply saying that FPGAs can't implement a function as good as a GPU because of fabric differences is not the best way to say "it is worse than a GPU." In short, the system specs and ultimately the cost efficiency says use the GPU or FPGA or CPU...and it will tell you which is better for *your* application.

--------------------------------------- Posted through

formatting link

Reply to
fpga_me

Someone sent on September 13th, 2011: |---------------------------------------------------------------------| |"[..] | | | |As a conclusion, the FPGA acceleration only suits some certain and | |fixed application. However in the real world , many projects and many| |algorithms are very uncertain and arbitrary. With same power | |consumption, GPU plan may lead better results. For a concrete | |project, I will consider GPU or DSP, and FPGA at last. | | | |Do everybody agree?" | |---------------------------------------------------------------------|

GPUs can outperform CPUs, but CPUs can outperform GPUs. It depends.

Reply to
Paul Colin Gloster

It depends on the application -- a lot, and the company a little.

I've done a lot of work around (and sometimes even on) a system that does a lot of per-pixel video processing. The actual algorithm is quite simple, but it needs to happen at video pixel rates, and the power dissipation needs to be low. For that app, an FPGA doing the pixel-level work made lots of sense.

For the version that I worked on, having a processor working hand-in-hand with the FPGA handling management tasks at the video line rate also made oodles of technical sense -- but ran afoul of some company political decisions (mostly a decision to maintain the illusion that a software guy who could handle "big box" GUI and communications interface stuff was the right guy to work on software that implemented a PLL at the video line rate).

For decisions that are even close to even-steven, being able to hire and manage a crew that can do the work becomes an important part of the mix

-- which means that if you're trying to do this sort of thing in an all- software company, a GPU solution may make oodles more sense than an FPGA solution, even if the FPGA solution is technically better. Similarly, if the hard part of the algorithm needs to have a lot of interaction with the hardware, and if management is composed of circuit designers, then an FPGA solution may be a better choice even if the better technical solution would have been to use a GPU.

--
www.wescottdesign.com
Reply to
Tim Wescott

|

FPGAs work best when you need to do a huge number of small fixed point operations, especially add/subtract/compare and some, but not a huge number, of multiplies and divides. The shifter needed for floating point addition and subtraction is big, and limits the use of FPGA for floating point work.

GPUs traditionally are designed to do a lot of single precision floating point. The use of the GPU for numerical processing takes advantage of the economy of scale or building them for display use.

I have heard that there is discussion toward building GPUs to do double precision, just for this purpose, though.

-- glen

Reply to
glen herrmannsfeldt

As an exercise I have recently written a USB interfaced FPGA based accelera= tor for a simple scientific algorithm (Binning) - the original code was wri= tten in Apl, and for large data sets (5 million) the FPGA did better than A= pl or compiled C. It also needs testing with more of the complete algorithm= moved into the FPGA, so there is less USB overhead. I have not yet done a = comparison with a GPU. There is a video of this working and being tested on YouTube :

formatting link
and a fuller description of the FPGA techniques used on the Farnell Element=

14 site :=20 FPGA Modular Firmware Skeleton for multiple instruments - Morph-IC-II, YouT= ube videos.
formatting link
os
Reply to
Dr. Beau Webber

for a simple scientific algorithm (Binning) - the original code was written in Apl, and for large data sets (5 million) the FPGA did better than Apl or compiled C. It also needs testing with more of the complete algorithm moved into the FPGA, so there is less USB overhead. I have not yet done a comparison with a GPU.

formatting link

site :

videos.

formatting link

I'm sure such a binning algorithm can be many times faster with a GPU, because it is easy to parallelize it, e.g. partitioning the input data in as many blocks as you have parallel processing units, each with its own result sum array, and finally summing all result arrays.

I've used CUDA for magnet field calculation and it was at least 20 times faster (depending on the graphics card)

formatting link

--
Frank Buss, http://www.frank-buss.de
electronics and more: http://www.youtube.com/user/frankbuss
Reply to
Frank Buss

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.