fft size in fpga

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View

I am implementing a 128 point real Radix-2 fft, data and coefficient
widths are 16 bit.
I am synthezising it for use in an FPGA. However, it is taking a very
long time to synthesize. (approx 3 days using Leonardo on a 2 GHz
machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram
required by the fft will be internal to the FPGA

Will this design take up all the space on the device. From past
experience, can someone give me an indication of what area of the
device the fft will occupy.

Surely if it takes up most of the device, then it will be too big, as
I have other features to implement in the FPGA also !!

Thank you

Re: fft size in fpga
It very heavily depends on your implementation.  The fact that it is
radix2 and
not radix 4 or higher tells me already that your implementation is not a
efficient one.  A radix 4 kernel is very little added complexity over a
radix 2
kernel and cuts the processing time and area considerable.  Depending on
speed requirements and your design prowess, you can certainly fit a 128
real-only FFT in a much smaller part than a 1M gate part.  As your speed
requirements decrease, you can take advantage of iterative or shared
to reduce the gate count considerably.  I did a 4096 point design in a
XCV1000 that does the complex transform in 68 us (the floorplan and a
description are on the gallery page on my website).  That one only
about 40% of the FPGA and includes some floating point stuff as well as
windowing multipliers and some other goodies.  It takes intimate
of the FPGA structure and of algorithms to achieve that level of density
(the customer called the design "a work of art"), but even a novice can
achieve a density/performance level of half that design with some
thought out design.

PJ wrote:

Quoted text here. Click to load it

--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: fft size in fpga
Quoted text here. Click to load it


I have noted with some synthesizers that if it cannot infer your RAM
"correctly" it will try to build it out of registers (as opposed to
the dedicated RAM on the FPGA).  This can end up taking a _very_ long
time for only reasonably large RAMs.

You might consider synthesizing each module in your design separately
until you find the part that causes the problem.  If it isn't already
structured thus it might be wise to do so as this may also help.



Re: fft size in fpga
Several of the textbooks on FFTs such as Smith & Smith "Handbook of realtime
describe higher radix FFTs.  I believe the Xilinx FFT core data sheet goes into
a fair
amount of detail on a radix 4 kernel.  The easiest way to do hardware sharing is
to use
one kernel and run the data through it in multiple passes.  You'll need to
change the
twiddle factors (which are really a phase rotation) and the data ordering.  The
can be handled by changing the input to a multiplier using a table.  The data
is doable by permuting the bits of your address counter.

DA is really a technique for doing multiply-accumulate, not just a multiply.  
There are
many ways to do the multiplications, for example if speed is not an issue you
can use a
scaling accumulator for a fairly compact multiplier.  See the page about
multiplication on
my website.  Flip-flop count is not the whole story, you also have to account
for the
combinatorial logic between flip-flops.  You could do a totally combinatorial
with no flip=flops, for example (although it would be quite slow if it is

PJ wrote:

Quoted text here. Click to load it

--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline