fpga space estimate

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hello all,

I would like some feeback :

I am planning to make a design in FPGA that has 4 2nd-order cascaded
IIR filters.
Now the question/feedback/advice which I am seeking is the following:

To what resolution can I have the input and output databuses of the
IIRs ?
Assume there is nothing else but the IIRs in the FPGA

P.S the FPGA is spartan 3  (400k gates)

I made a rough estimate :
I would be needing ~800-1000FFs (there is atotal of 8k)
~14 16-bit adders (do not know the total)
~8 18x18 dedicated multipliers (there is a total of 16)
and a whole bunch of muxes. I estimate about ~2000 4:1 muxes/demuxes

The above bunch of logic is for
4 2nd order IIRs
16 bit input databus for each IIR
16 bit output databus for each IIR
64 bit feedfwd & feedbck coeeficients for each IIR
An input DC gain of 2^12 for each IIR
One,  and only one, 96 bit adder responsible for all the sums
One,  and only one, 27x64 bit multiplier responsible for all the
multiplication
The adder and the multipler will function at a much higher frequency
than the sample rate, hence permitting them to do all the operations
for all the IIRs,
Sample rate is 1MHz. I am assuming that the sample rate can be
multiplied up by a factor of at least of 50. 50 would give at LEAST
1cycles/operation. There are 20 sums and 20 multiplication to be done
per sample period.

Hence, I arrived to the conclusion that such a digital filter design
will take me ~25% of the space of the FPGA.  Does this sound accurate ?
However I do not know how to account for routing overhead.

I would appreciate previous projects citiings and how much % of the
FPGA they occupied.

Thx in advance
-Roger


Re: fpga space estimate
Quoted text here. Click to load it

You could reduce your resource requirements significantly by implementing a
multi-channel, multi-stage mechanism that manipulates your data and
coefficients through one BlockRAM - eliminating most of the multiplexers -
and pipelines some of the operations such as the multiply to use fewer
resources overall.

For these kinds of things, a little pseudocode and a spreadsheet can help to
visualize how to break up the problem and verify the soultion.

Are you looking specifically for a tiny solution?



Re: fpga space estimate

Quoted text here. Click to load it


I am looking for a solution that fits in the FPGA. Tiny?, not really,
as long as every thing fits.

Quoted text here. Click to load it

BlockRAM. Great idea! I checked the timing specs of the blockram
module, and it seems pretty fast.1clock cycle to write and 1 clock
cycle for read. max freq of ~160MHz. No need for a complex multiplexing
network. In fact, there is no need for delay elements
(FFs)alltogether!.

However, I never used RAM on an FPGA (that is the reason I did not
initially lean towards that solution). Is there some obvious, flagrant
, blatant drawback when using RAM , instead of FFs ? Especially since
there is 36 times more RAM bits than available FFs (288K vs 8K). And in
RAM, ALL the bits can be used!

According to the timing waveform in the specs, it only requires 1 cycle
for read and 1 cycle for write --so I do not think loss of cycles
between data transters will be an issue, especially if the data rate is
~150 times slower than the fastest clock available. The module that
performs the multiplication can thus be time-multiplexed.

It is sounds like it is working on a DSP, rather than a FPGA, if one
foregoes the use of FFs...:-)

-Roger


Re: fpga space estimate
Roger,

Depending on speed, using an FPGA can be exactly like using a DSP.

The use of the BRAM basically means you are building a custom DSP
machine, which will 'execute' a fixed program (based on a FSM),
manipulating the BRAM contents much like a DSP would bring operands in
and out of the ALU, to and from memory.

Personally, if something this slow would work, to make your life easier,
you might consider Microblaze as the processor, and execute both program
and data from BRAMs.  That way the program (which may already be in c
code) could remain in c code.

Or, alternatively, use a "real" DSP processor, as (let's be honest) the
FPGA may be extreme overkill for what you may be doing.

If the speed there is just not fast enough, there may be hardened FFT
filter structures that are serial, rather than parallel, which still may
be fast enough (faster than a DSP), and yet use fewer resources (than a
full parallel one).  The SRL16s are particularly good at this, as you
have up to 16 FFs for the SLICEs with SRLs/LUTRAM.

Remember that a parallel multiply may not be needed, and a serial
multiplier may be a lot less hungry (for resources, overall).

Many extreme audio applications (see NAB conference) use serial
processing of many audio streams at once on a signle FPGA for a superb
cost/performance point.

http://biz.yahoo.com/prnews/060419/sfw113.html?.v30 %

Finally, if the problem can be partitioned in time into more than one
piece, I have seen people calculate part 1, store results in an external
SRAM, reconfigure, and then read in last part and calculate part 2,
store results in external SRAM, etc...

Austin

Re: fpga space estimate
<snip>
Quoted text here. Click to load it

If you want a dedicated port to a controller to allow on-the-fly update of
coefficient values, a dual-port RAM would implement the controller on one
port and the data I/O on the other.  If you have a fixed configuration, you
can dedicate one port for read, one for write, and your data can flow at the
full 320 MHz BlockRAM rate.  Dual-ports are great.

Initializing BlockRAM contents always seems a little tough with the
synthesis and simulation tools never quite making it practical to get
everything flowing just right.  If you look into the help or app notes from
the various tools, you could have pre-initialized BlockRAMs for fixed
coefficients to make life simpler.

For your application, this really *is* best implemented in a DSP mindset;
you can keep your resources low (1 MAC) and maintain the values in a
register file with limited I/O in your algorithm.  Since you have 100x+ the
sample rate to do your processing, the system flows beautifully.  The only
question for me would be how complex the state machine or microcode would
need to be to have the system work beautifully without adding a generic
processor like the MicroBlaze or similar.  This is where prototyping with
pseudo-code and an Excel spreadsheet get me to my results with a simple
implementation.

For me, these kinds of tasks are great fun.



Re: fpga space estimate

Quoted text here. Click to load it




pseudo-code ???
What exactly do you mean by peudo-code?

-Roger


Re: fpga space estimate
Quoted text here. Click to load it

Just writing down what ssteps you'd take to implement the code in your data
path.  It's helpful to "see" the data pipeline by looking at the steps and
the loops to manipulate the data.



Re: fpga space estimate
Thats seems reasonable in terms of HW resources but I would throw in a
guard of atleast another 50% till you have done an actual  synthesis
with P/R. For most data paths even hand placed, I usually see 1/3 of
the resources can't be used, conflicts of placement etc. . So fo N
known flops used, add atleast another 20% which can't be used. For your
really wide 96bit adders and 64bit mult you want to pipeline those and
that adds many flops. YMMV

John Jakson
transputer guy


Site Timeline