Estimating number of FPGAs needed for an application

Hi all

I'm absolutely new to FPGAs, in fact my work is much more related with the SW than with the HW, so I need to solve a problem that ideally I was not targeted to.

The issue is this: I have to estimate (roughly) the number of FPGAs needed to support a typical signal processing algorithm, steps are as follows, always in single-precision:

1.16k complex samples FFT
  1. 16k complex vector multiplication
  2. 16k complex samples IFFT
  3. 16k complex vector multiplication
  4. 16 k complex vector sum

The idea is to know how many FPGAs will cover this kind of processing in a given time, to compare with different types of processors. Por the later, it is really easy just counting number of operations in GFLOPs, but with hardware devices I am getting a lot of trouble, since I don't have a clear understanding on what should I count.

Please, give me a hand!

Ruben

Reply to
rbblasco
Loading thread data ...

On Mar 12, 1:35 pm, snipped-for-privacy@gmail.com wrote: [...]

With hardware implementation you will need to specify the time you want this algorithm being processed in. It will make a difference in the implementation. The faster you want to go, the more you need to implement in parallel and the more resources you will need.

For the FFT you can request a design fit from here:

formatting link

But that is a specific design fit for their FFT. So you might find other vendors that get you a different fit.

Cheers,

Guenter

Reply to
Guenter

There is additional information needed for this evaluation:

- how often do you need a result (throughput and latency)?

- what is the data type (integer? float? precision?)

Unlike CPUs FPGAs have no native datatypes. For cryptographic applications you might want to run an FFT on vectors of single bits. For DNA matching you might have

2-bit or 4-bit data types. For DSP 18-bit or 36-bit integers are a common choice for Xilinx FPGAs.

The algorithm that you describe implemented serially on 1-bit data would use 1% of a small FPGA and run for several hundred thousand clock cycles.

Alternatively on a large FPGA OTOH you can perform a few hundred 18- bit x 18-bit Multiplications per cycle.

Kolja Sulimma

Reply to
comp.arch.fpga

You left out a key piece of information: how fast do you need to compute these 5 steps? A processor that can do all 5 can fit on a single FPGA provided there is a reasonable amount of time between data sets, and that there is enough memory available to buffer the input (if needed), store intermediate results, and buffer the output. The wide swath ocean altimeter design featured in the gallery on my website (

formatting link
), for example does everything on your list, in the same order and more in under 250usec for a 4K point data set using very old (original virtex) technology, which has comparatively little on-chip memory and no embedded multipliers. About

2/3rd of the area is dedicated to storage buffers using SRL16s (the large cyan block in the middle rignt, the magenta/green block below it, and the yellow/green blocks at the bottom are all buffers). The FPGA size is small, features are sparse and speed is slow by today's standards. Implementation size depends heavily on the FFT implementation of course. My FFT kernel has the smallest size-performance footprint, so using others will result in a bigger design for a given speed.
Reply to
Ray Andraka

comp.arch.fpga wrote: (snip)

For dynamic programming algorithms, the favorite way to do DNA matching, it is usual to do 16 bit fixed point arithmetic.

-- glen

Reply to
glen herrmannsfeldt

First, floating point tends to be a lot bigger on FPGAs than fixed point, especially floating point addition. If you can get away with fixed point, even if the actual width is somewhat larger, it is probably worth doing.

Also, you can't just count 'FPGA', but you have to take into account the size of the different FPGAs, even from the same product family.

I like systolic array processors, which usually work well for this type of problem. The thought process for hardware implementations, especially good pipelined ones, is somewhat different than for software implementations. Usually hardware implementations are used when software isn't fast enough, so you need to know how fast it has to go.

There is a tradeoff between time and size, but it isn't linear enough to quote without more details.

-- glen

Reply to
glen herrmannsfeldt

With everyone else's previously mentioned comments in mind as well, I would recommend downloading Xilinx's webpack tool. Open their "Core Generator" software. Run the FFT core from there. You can enter in things like processing frequency, sample frequency, etc... and it will give you a resource utilization. You can also pull this information from the datasheet for their radix-2 fft core.

I wish you the best of luck - but you may want to recommend that your boss consult a hardware engineer. With all do respect to software engineers (I can't write decent C code to save my life) despite what management likes to believe, FPGA design is hardware design, not software design. Without have a good deal of background experience in digital design, you're going to find it's difficult to make this kind of estimate accurately. Again, nothing against software folks, it's just a different set of training and experience that's required.

Reply to
Paul

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.