Considerations for FPGA Based Acceleration in Bio medical simulations/computational biology

What are some of the considerations while migrating from a pure SW based bio-medical simulation to a HW based simulation/acceleration solution? Insights on both the business & technical considerations would be appreciated

Thanks Anand

Reply to
anand
Loading thread data ...

What are you doing?

Reply to
Brad Smallridge

Brad,

Thanks for the response. I will try to outline what I am doing with some specific concerns I have:

Basically, I am working on behalf of a company that develops compute intensive algorithms for biological applications using a SW programming language like C/C++. That company is trying to get a performance boost by mapping the same algorithms onto a Hardware Platform like an FPGA or ASIC or anything in between. Main idea is to see if we can get a minimum of 10X-20X speedup versus a software implementation.

Here are some of my basic concerns:

(1) Many of the end-customers now use laptops as their only computers, sometimes with a docking station and/or external keyboard and monitor when they're at their desks. How would a endcustomer implement such a hardware solution? Would it come as a plug-in card for a card slot (unusable on a laptop), or in some format that would enable using it on a laptop? How about a standalone box?

(2) Assuming one is able to connect the HW implementation to a laptop, how would the end customer feed the input files. Note that in some apps, the input file is ASCII text, while in other apps, it may be binary files in a proprietary format. How does the output of the simulation be collected? Wd it be redirected to an ASCII text file?

(3) What happens when the algorithm needs to be updated? Is there a way to "update" the hardware (such as an FPGA), or does is it mean the hardware becomes obsolete and must be replace (if so, at what kind of cost to an end user)?

(4) Hardware/Software Partitioning: Can various "core" functions be programmed into the hardware while still allowing other functions to be in software in order to provide flexibility in the mathematical models? If so, is the potential speed advantage still high?

(5) Can you shed some light on how one can translate existing code from C/C++ to a HW platform? What tools would be used, how would the design be verified, and how long does it take to get a working demo version?

(6) What about if the existing code is in a proprietary language, other than C/C++? Is it possible to translate into a HW mapping in that case?

(7) Finally, to get a demo/working prototype, what do you recommend, FPGA, or ASIC or something in between and why? If you had to take a stab at guessing the cost for developing such a prototype, what would it be? Assume about 100,000 lines of existing code in C/C++.

Thanks Anand

Reply to
anand

Systolic array processors. Especially if the problem is dynamic programming, but many others work well as systolic arrays, too.

There are PCMCIA cards with FPGAs in them. For real accelerators, you want a large box (maybe desktop PC size), but for 10x or 20x a PCMCIA card might do.

This isn't really applicable to the discussion.

FPGA is a good choice. Not only can it be updated when the algorithm needs change, but it can be updated while running. It might be that you make one pass through the data with one configuration, reprogram and process it with another configuration.

That would be the usual way, yes. Find the parts of the algorithm where the most intense computation is done. It might be a few lines of C inside nested for loops.

The design process is completely different from serial C programming.

You want to go from the description of the algorithm, not C or C++ code. In that case, a proprietary language is probably better.

It is the wrong way to think about it if you have 100,000 lines of C code. Consider an algorithm based on an FFT. The total may be 100,000 lines but the FFT (or at least DFT) can be written in just a few lines. You want to find those few lines that are executed 1e12 times, program those into the FPGA, write code to do the I/O to the FPGA, and leave the rest of the 100,000 lines as they are.

-- glen

Reply to
glen herrmannsfeldt

Fantastic response Glen! I do appreciate it! Also, do you consult on projects of this nature? If so, how much do you charge? You could email me in private if you wish at my Yahoo ID icebreaker74

Thanks Anand

glen herrmannsfeldt wrote:

Reply to
anand

My slant on this (without knowing the scale of what's needed) would be a (little?) box which plugged into a USB port - it would contain a USB2 interface and FPGA(s) and probably RAM (I don't know anthing about your algorithms - maybe it needs to be a big box with lots of RAM!)

The configuration for the FPGA would be sent over the USB at startup and anytime the algorithm needed to change.

An application or library on the PC would deal with all the talking to USB and file transfers, etc.

You can send data up or down the USB at about 40-50MBytes/second. I couldn't possibly predict the potential speed-up of your algorithms. I'm somewhat skeptical that you're going to get 10-20 times speedup over a modern PC, but you might.

I'd probably be rewriting my inner loops in assembler on the PC first - without knowing anything about the calibre of the guys that write your existing software, I tend to find that there's some pretty low-hanging fruit around when it comes to speeding-up other people's code.

Will

(will.dean_at_indcomp.co.uk if you want to talk about the project professionally...)

Reply to
Will Dean

You could use PCI-express x1 on newer laptops which transfers 2.5 Gbit/s. Other than that there's ethernet, usb2. (with ethernet you could share the unit easy between hosts)

It's a matter of converting data between your application and the protocol you define in the fpga.

It can be a simple matter of "recompiling" to use C/C++ language terms. Then you use the same communication channel as for sending data to load the new algorithm(s).

Partitioning is balance of communication demands and speed benefits. Other than than partitioning can be done in any manner. There's also the option of reprogramming the fpga "in-flight" to accelerate different algorithms as needed.

Maybe SystemC could be of use. It proberbly needs to be done at the algorithmic level. On those parts (functions) that will benefit from it. The main issue is wheather the algorithm & data can be parallised and pipelined in such way it will actually be faster. A function call would essentially take parameters, send them to fpga, await answer from fpga, return data.

Can be done.

FPGA in this case because requirements are not 100% clear. FPGA allows easy changes compared to ASIC. FPGA can be reused in way ASIC can't aswell. ASIC can be used in later stage when algorithms are more settled for massive computeing cluster or similar.

I think the cost is quite dependent on the particular setup..

Reply to
pbgbbrsh

Hello Anand, Our company is a leading engineering services firm with a seasoned team of algorithm acceleration specialists. There are numerous ways to achieve your goal depending on your algorithm, budget, and tool comfort. For example, there are new FPGA solutions that can plug into quad-Opteron sockets and co-process over a very high-speed, low-latency bus such as Hypertransport. These FPGA daughtercards can plug & play but your algorithm will require a nontrivial effort for setup, integration, analysis, and optimization. Knowing C and HDL is important, along with the associated tool flows (MATLAB, SysGen or DSP Builder, ModelSim, Quartus or ISE, etc). You might also consider some of the hybrid FPGA-MCU devices from IPFlex, Stretch, and others. And there is always the embedded systems approach of using FPGAs, DSPs, MCUs -- or a combination of such devices to achieve your cost-performance goals. On average, 40x acceleration is not uncommon.

If you are interested in engaging a professional firm, I am confident that our team can architect a solution and help optimize your algorithms to the fabric. Feel free to contact us if we can be of service.

formatting link
Kind regards, Chris Hallahan

anand wrote:

Reply to
chris.hallahan

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.