#### Do you have a question? Post it now! No Registration Necessary

**posted on**

- Roger Bourne

April 20, 2006, 6:35 pm

Hello all,

I would like some feeback :

I am planning to make a design in FPGA that has 4 2nd-order cascaded

IIR filters.

Now the question/feedback/advice which I am seeking is the following:

To what resolution can I have the input and output databuses of the

IIRs ?

Assume there is nothing else but the IIRs in the FPGA

P.S the FPGA is spartan 3 (400k gates)

I made a rough estimate :

I would be needing ~800-1000FFs (there is atotal of 8k)

~14 16-bit adders (do not know the total)

~8 18x18 dedicated multipliers (there is a total of 16)

and a whole bunch of muxes. I estimate about ~2000 4:1 muxes/demuxes

The above bunch of logic is for

4 2nd order IIRs

16 bit input databus for each IIR

16 bit output databus for each IIR

64 bit feedfwd & feedbck coeeficients for each IIR

An input DC gain of 2^12 for each IIR

One, and only one, 96 bit adder responsible for all the sums

One, and only one, 27x64 bit multiplier responsible for all the

multiplication

The adder and the multipler will function at a much higher frequency

than the sample rate, hence permitting them to do all the operations

for all the IIRs,

Sample rate is 1MHz. I am assuming that the sample rate can be

multiplied up by a factor of at least of 50. 50 would give at LEAST

1cycles/operation. There are 20 sums and 20 multiplication to be done

per sample period.

Hence, I arrived to the conclusion that such a digital filter design

will take me ~25% of the space of the FPGA. Does this sound accurate ?

However I do not know how to account for routing overhead.

I would appreciate previous projects citiings and how much % of the

FPGA they occupied.

Thx in advance

-Roger

I would like some feeback :

I am planning to make a design in FPGA that has 4 2nd-order cascaded

IIR filters.

Now the question/feedback/advice which I am seeking is the following:

To what resolution can I have the input and output databuses of the

IIRs ?

Assume there is nothing else but the IIRs in the FPGA

P.S the FPGA is spartan 3 (400k gates)

I made a rough estimate :

I would be needing ~800-1000FFs (there is atotal of 8k)

~14 16-bit adders (do not know the total)

~8 18x18 dedicated multipliers (there is a total of 16)

and a whole bunch of muxes. I estimate about ~2000 4:1 muxes/demuxes

The above bunch of logic is for

4 2nd order IIRs

16 bit input databus for each IIR

16 bit output databus for each IIR

64 bit feedfwd & feedbck coeeficients for each IIR

An input DC gain of 2^12 for each IIR

One, and only one, 96 bit adder responsible for all the sums

One, and only one, 27x64 bit multiplier responsible for all the

multiplication

The adder and the multipler will function at a much higher frequency

than the sample rate, hence permitting them to do all the operations

for all the IIRs,

Sample rate is 1MHz. I am assuming that the sample rate can be

multiplied up by a factor of at least of 50. 50 would give at LEAST

1cycles/operation. There are 20 sums and 20 multiplication to be done

per sample period.

Hence, I arrived to the conclusion that such a digital filter design

will take me ~25% of the space of the FPGA. Does this sound accurate ?

However I do not know how to account for routing overhead.

I would appreciate previous projects citiings and how much % of the

FPGA they occupied.

Thx in advance

-Roger

Re: fpga space estimate

You could reduce your resource requirements significantly by implementing a

multi-channel, multi-stage mechanism that manipulates your data and

coefficients through one BlockRAM - eliminating most of the multiplexers -

and pipelines some of the operations such as the multiply to use fewer

resources overall.

For these kinds of things, a little pseudocode and a spreadsheet can help to

visualize how to break up the problem and verify the soultion.

Are you looking specifically for a tiny solution?

Re: fpga space estimate

I am looking for a solution that fits in the FPGA. Tiny?, not really,

as long as every thing fits.

BlockRAM. Great idea! I checked the timing specs of the blockram

module, and it seems pretty fast.1clock cycle to write and 1 clock

cycle for read. max freq of ~160MHz. No need for a complex multiplexing

network. In fact, there is no need for delay elements

(FFs)alltogether!.

However, I never used RAM on an FPGA (that is the reason I did not

initially lean towards that solution). Is there some obvious, flagrant

, blatant drawback when using RAM , instead of FFs ? Especially since

there is 36 times more RAM bits than available FFs (288K vs 8K). And in

RAM, ALL the bits can be used!

According to the timing waveform in the specs, it only requires 1 cycle

for read and 1 cycle for write --so I do not think loss of cycles

between data transters will be an issue, especially if the data rate is

~150 times slower than the fastest clock available. The module that

performs the multiplication can thus be time-multiplexed.

It is sounds like it is working on a DSP, rather than a FPGA, if one

foregoes the use of FFs...:-)

-Roger

Re: fpga space estimate

Depending on speed, using an FPGA can be exactly like using a DSP.

The use of the BRAM basically means you are building a custom DSP

machine, which will 'execute' a fixed program (based on a FSM),

manipulating the BRAM contents much like a DSP would bring operands in

and out of the ALU, to and from memory.

Personally, if something this slow would work, to make your life easier,

you might consider Microblaze as the processor, and execute both program

and data from BRAMs. That way the program (which may already be in c

code) could remain in c code.

Or, alternatively, use a "real" DSP processor, as (let's be honest) the

FPGA may be extreme overkill for what you may be doing.

If the speed there is just not fast enough, there may be hardened FFT

filter structures that are serial, rather than parallel, which still may

be fast enough (faster than a DSP), and yet use fewer resources (than a

full parallel one). The SRL16s are particularly good at this, as you

have up to 16 FFs for the SLICEs with SRLs/LUTRAM.

Remember that a parallel multiply may not be needed, and a serial

multiplier may be a lot less hungry (for resources, overall).

Many extreme audio applications (see NAB conference) use serial

processing of many audio streams at once on a signle FPGA for a superb

cost/performance point.

http://biz.yahoo.com/prnews/060419/sfw113.html?.v30 %

Finally, if the problem can be partitioned in time into more than one

piece, I have seen people calculate part 1, store results in an external

SRAM, reconfigure, and then read in last part and calculate part 2,

store results in external SRAM, etc...

Austin

Re: fpga space estimate

<snip>

If you want a dedicated port to a controller to allow on-the-fly update of

coefficient values, a dual-port RAM would implement the controller on one

port and the data I/O on the other. If you have a fixed configuration, you

can dedicate one port for read, one for write, and your data can flow at the

full 320 MHz BlockRAM rate. Dual-ports are great.

Initializing BlockRAM contents always seems a little tough with the

synthesis and simulation tools never quite making it practical to get

everything flowing just right. If you look into the help or app notes from

the various tools, you could have pre-initialized BlockRAMs for fixed

coefficients to make life simpler.

For your application, this really

you can keep your resources low (1 MAC) and maintain the values in a

register file with limited I/O in your algorithm. Since you have 100x+ the

sample rate to do your processing, the system flows beautifully. The only

question for me would be how complex the state machine or microcode would

need to be to have the system work beautifully without adding a generic

processor like the MicroBlaze or similar. This is where prototyping with

pseudo-code and an Excel spreadsheet get me to my results with a simple

implementation.

For me, these kinds of tasks are great fun.

If you want a dedicated port to a controller to allow on-the-fly update of

coefficient values, a dual-port RAM would implement the controller on one

port and the data I/O on the other. If you have a fixed configuration, you

can dedicate one port for read, one for write, and your data can flow at the

full 320 MHz BlockRAM rate. Dual-ports are great.

Initializing BlockRAM contents always seems a little tough with the

synthesis and simulation tools never quite making it practical to get

everything flowing just right. If you look into the help or app notes from

the various tools, you could have pre-initialized BlockRAMs for fixed

coefficients to make life simpler.

For your application, this really

***is***best implemented in a DSP mindset;you can keep your resources low (1 MAC) and maintain the values in a

register file with limited I/O in your algorithm. Since you have 100x+ the

sample rate to do your processing, the system flows beautifully. The only

question for me would be how complex the state machine or microcode would

need to be to have the system work beautifully without adding a generic

processor like the MicroBlaze or similar. This is where prototyping with

pseudo-code and an Excel spreadsheet get me to my results with a simple

implementation.

For me, these kinds of tasks are great fun.

Re: fpga space estimate

Thats seems reasonable in terms of HW resources but I would throw in a

guard of atleast another 50% till you have done an actual synthesis

with P/R. For most data paths even hand placed, I usually see 1/3 of

the resources can't be used, conflicts of placement etc. . So fo N

known flops used, add atleast another 20% which can't be used. For your

really wide 96bit adders and 64bit mult you want to pipeline those and

that adds many flops. YMMV

John Jakson

transputer guy

guard of atleast another 50% till you have done an actual synthesis

with P/R. For most data paths even hand placed, I usually see 1/3 of

the resources can't be used, conflicts of placement etc. . So fo N

known flops used, add atleast another 20% which can't be used. For your

really wide 96bit adders and 64bit mult you want to pipeline those and

that adds many flops. YMMV

John Jakson

transputer guy

#### Site Timeline

- » An experience with Xilinx 8.1.02i
- — Next thread in » Field-Programmable Gate Arrays

- » Xilinx PCIe core vs. Icarus Verilog
- — Previous thread in » Field-Programmable Gate Arrays

- » How to generate bits info for a record structure?
- — Newest thread in » Field-Programmable Gate Arrays

- » Code block in icestudio
- — Last Updated thread in » Field-Programmable Gate Arrays

- » can service manuals still be obtained through Radio Shack for RS products?
- — The site's Newest Thread. Posted in » Electronics Repair