#### Do you have a question? Post it now! No Registration Necessary

**posted on**

- Roger Bourne

March 20, 2006, 9:56 pm

Hello all,

Concerning digital filters, particurlarly IIR filters, is there a

preferred approach to implementation - Are fixed-point preferred over

floating-point calculations ? I would be tempted to say yes. But, my

google search results leave me baffled for it seems that floating-point

computations can be just as fast as fixed-point.

Furthermore, assuming that fixed-point IS the preferred choice, the

following question crops up:

If the input to the digital filter is 8 bits wide and the coefficents

are 16 bits wide, then it would stand to reason that the products

between the coefficients and the digital filter intermediate data

values will be 24 bits wide. However, when this 24-bit value is to get

back in the delay element network (which is only 8 bits wide), some

(understatemen) resolution will be lost. How is this resolution loss

dealt with? so it will lead to an erroneous filter?

-Roger

Concerning digital filters, particurlarly IIR filters, is there a

preferred approach to implementation - Are fixed-point preferred over

floating-point calculations ? I would be tempted to say yes. But, my

google search results leave me baffled for it seems that floating-point

computations can be just as fast as fixed-point.

Furthermore, assuming that fixed-point IS the preferred choice, the

following question crops up:

If the input to the digital filter is 8 bits wide and the coefficents

are 16 bits wide, then it would stand to reason that the products

between the coefficients and the digital filter intermediate data

values will be 24 bits wide. However, when this 24-bit value is to get

back in the delay element network (which is only 8 bits wide), some

(understatemen) resolution will be lost. How is this resolution loss

dealt with? so it will lead to an erroneous filter?

-Roger

Re: Fixed vs Float ?

This is a simple question with a long answer.

Floating point calculations are always easier to code than fixed-point,

if for no other reason than you don't have to scale your results to fit

the format.

On a Pentium in 'normal' mode floating point is just about as fast as

fixed point math; with the overhead of scaling floating point is

probably faster -- but I suspect that fixed point is faster in MMX mode

(someone will have to tell me). On a 'floating point' DSP chip you can

also expect floating point to be as fast as fixed.

On many, many cost effective processors -- including CISC, RISC, and

fixed-point DSP chips -- fixed point math is significantly faster than

floating point. If you don't have a ton of money and/or if your system

needs to be small or power-efficient fixed point is mandatory.

In addition to cost constraints, floating point representations use up a

significant number of bits for the exponent. For most filtering

applications these are wasted bits. For many calculations using 16-bit

input data the difference between 32 significant bits and 25 significant

bits is the difference between meeting specifications and not.

For

___any___digital filtering application you should know how the data

path size affects the calculation. Even though I've been doing this for

a long time I don't trust to my intuition -- I always do the analysis,

and sometimes I'm still surprised.

In general for an IIR filter you

___must___use significantly more bits for

the intermediate data than the incoming data. Just how much depends on

the filtering you're trying to do -- for a 1st-order filter you usually

to do better than the fraction of the sampling rate you're trying to

filter, for a 2nd-order filter you need to go down to that fraction

squared*. So if you're trying to implement a 1st-order low-pass filter

with a cutoff at 1/16th of the sample rate you need to carry more than

four extra bits; if you wanted to use a 2nd-order filter you'd need to

carry more than 8 extra bits.

Usually my knee-jerk reaction to filtering is to either use

double-precision floating point or to use 32-bit fixed point in 1r31

format. There are some less critical applications where one can use

single-precision floating point or 16-bit fractional numbers to

advantage, but they are rare.

* There are some special filter topologies that avoid this, but if

you're going to use a direct-form filter out of a book you need fraction^2.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed vs Float ?

Oops -- thought I was responding on the dsp newsgroup.

Everything I said is valid, but if you're contemplating doing this on an

FPGA the impact of floating point vs. fixed is in logic area and speed

(which is why fast floating point chips are big, hot and expensive).

Implementing an IEEE compliant floating point engine takes a heck of a

lot of logic, mostly to handle the exceptions. Even if you're willing

to give up compliance for the sake of speed you still have some

significant extra steps you need to take with the data to deal with that

pesky exponent. I'm sure there are various forms of floating point IP

out there that you could try on for size to get a comparison with

fixed-point math.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed vs Float ?

comments.

I think that the added complexity of floating point in an FPGA will

probably be enough to rule it out.

Fixed point implementations are often better than floating point

implementations. This comparison tends to be true when the result of a

multiplication is twice the width as the inputs in the fixed point case

and when a floating point result is the same size as its inputs. This is

usually the case in a DSP processor. This also assumes that you use a

filter structure that takes advantage of the long result.

Most IIR filters are constructed as cascaded biquads (and sometimes one

first order section). The choice of the biquad structure has a

significant impact on performance. If we restrict our choices to one of

the direct forms, then usually the direct form I (DF I) structure is best

for fixed point implementations. This assumes that we have a double wide

accumulator. If this is not the case, the DF I is not a particularly good

structure. Floating point implementations are usually implemented as DF

II or the slightly better transposed DF II.

You can also improve the performance of a fixed point DF I by adding

error shaping. This is relatively cheap from a resoursce point of view in

this structure.

As Tim pointed out, you have to pay attention to scaling with fixed point

implementations.

Like every design problem, you need to examine the performance

requirements carefully. I would look at the pole-zero placement on the

unit circle. For you need a high Q filter at some low frequency as

compared to the sampling rate, the math precision is going to be

critical. The poles might not be on the unit circle, but they will be

very close. If the precision is poor, the filter is likely to blow up. In

other situations, just about anything will work.

Here is a good link describing biquad structures:

http://www.earlevel.com/Digital%20Audio/Biquads.html

--

Al Clark

Danville Signal Processing, Inc.

Al Clark

Danville Signal Processing, Inc.

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed vs Float ?

Hello,

(newbie) Question:

At the intermediate nodes, between biquad-structures in a cascaded

biquad structure IIR filter design approach (employing the fixed point

approach), the resolution of the (extended) accumulator (of the output)

must be scaled down to the width of the data bus, Rigth?

(Internal musing: that would require a fixed point divider, I wonder

how many cycles divison takes?)

The scaling-down to the original databus width is required because the

next biquad filter in the cascaded structure is expecting an input of n

bits. n being the number the number of bits in the databus. m being the

number of bits of the coefficients. Correct?

Would/Can that cause problems ? ( that perphaps are not obvious to me

rigth now).

Are there any tools (freeware) that permit to cascade structures?

(newbie) Question:

At the intermediate nodes, between biquad-structures in a cascaded

biquad structure IIR filter design approach (employing the fixed point

approach), the resolution of the (extended) accumulator (of the output)

must be scaled down to the width of the data bus, Rigth?

(Internal musing: that would require a fixed point divider, I wonder

how many cycles divison takes?)

The scaling-down to the original databus width is required because the

next biquad filter in the cascaded structure is expecting an input of n

bits. n being the number the number of bits in the databus. m being the

number of bits of the coefficients. Correct?

Would/Can that cause problems ? ( that perphaps are not obvious to me

rigth now).

Are there any tools (freeware) that permit to cascade structures?

Re: Fixed vs Float ?

Yes, this is usually the case

Why a division? For example, if I have a number in 1.63 format (1 sign

bit, 63 fractional bits, I can either round or truncate the result to

1.31 format.

Yes

The quantitizer (the process that shortens the fix point word) is going

to have a very small effect.

Almost all filter programs assume cascade structures. I don't know much

about the free ones. I use QEDesign 1000 (www.mds.com) whicj is very

good. One of the advantages of QED is that the programmer is a very good

DSP guy. This is not generally the case.

Matlab is also very popular for filter design.

--

Al Clark

Danville Signal Processing, Inc.

Al Clark

Danville Signal Processing, Inc.

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed vs Float ?

I compared an [8th order lowpass chebychev lowpass filter, 16-bit fixed

point] with [2nd order lowpass chebychev lowpass filter, 16 bit fixed

point, whose frequency I multiplied by 4 as to emulate a 4

2ndorder-cascaded-structure].

I used WinFilter freeware.

(-I do not yet know if all 2nd order IIR filters can be called biquads.

Have to look into that...)

Anyways, based on the attenuation evaluated from the frequency response

from both filters (8th and 4x2nd), the 8th order filter clearly was the

better filter. It's attenuation was stronger and faster (rolloff rate

greater). The 4x2nd structure did eventually outperform the attenuation

of the 8th order filter, but only because the 8th order filter had

reached its 16 bit attenuation floor.

The 4x2nd order filter structure frequency response (attenuation) was

most definitely NOT sharp!

Stability ? Based on the pole enplacements of the 2nd order filter, the

2nd order filter if FAR more stable than the 8th order filter. Its

poles are nowhere near the unit cercle's circumference. On the other

hand, the 8th order filter poles are located nearer the unit cercle's

circumference (than the 2nd order's poles), but I would not say that

the poles are shadowing the unit cercle's circumference. Except for 2

of the 8 poles - they are located near to +j and -j. Nonetheless, the

poles were eveluated using a 16 bit limited precision and consequently

were displaced (at least I assume they were) from their

infinite-precision-theoretical-locations. Thus, since all the poles

were found ALL in the unit circle, the IIR filter should be stable.

(I have a feeling I am leaving myself wide open for a finger-waggling

session)

Thus, my question is:

Why are cascaded-biquad-structures preferred over non-composite higher

order filters since the attenuation pays such a high price? for IIR

filters, of course.

Thx in advance

-Roger

Re: Fixed vs Float ?

You misunderstood what was said.

If you want to implement that 8th-order Chebychev filter you can choose

several methods. You might think that the most sensible thing to do

would be to implement it as an 8th-order direct form filter. If you did

you would be wrong. Why? Because the pole locations of a filter are

sensitive to the accuracy of the coefficients, and this sensitivity

increases sharply as the filter order goes up.

For a 1st-order filter the pole sensitivity is roughly equal to the

precision of the coefficient, so a 1r15 coefficient will give you a pole

that is no more than 2^-15 off from target. For a 2nd-order filter the

pole sensitivity is roughly equal to the square root of the precision of

the coefficient, so a 1r15 coefficient will give you a pole that could

be off by as much as 0.006. Note that in some systems this amount of

variation could make or break the system performance. Extend this to an

8th-order system and your 1r15 coefficient gives you poles that will

wander by as much as 0.27 -- that's going to be a pretty useless filter!

For pretty much the same reasons the accuracy requirements of your

arithmetic goes up with filter order.

So what you do is you take your filter and you break it into sections of

no more than 2nd-order each. You implement each one of these

individually, and cascade them. The transfer function of the cascade is

the product of the individual transfer functions so you get the response

that you need, but the accuracy requirements are no more than for

2nd-order sections, so you don't need to use an infinite number of bits

to do your work.

I have a pair of suggestions for you:

First, hie thee down to a bookstore and get a copy of "Understanding

Digital Signal Processing" by Richard G Lyons. It's a good book, and

it's written for people who need to know the stuff without experiencing

a lot of pain. This link will get you a copy:

http://www.powells.com/partner/30696/biblio/0-13-108989-7 .

Second, think of posting (or cross-posting) questions like this to

comp.dsp. Al and I both frequent that group; there are other's there

(including Rick Lyons) who may have useful input.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed vs Float ?

Right. Or, since this

___is___an FPGA group, the data bus must be scaled

up to match the resolution of the accumulator.

Lopping off bits doesn't necessarily mean scaling the numbers down

numerically. If you view your numbers as integers then throwing away

the least significant 16 and keeping the most significant 24 could be

seen as a divide operation -- but if you view your numbers as fractional

it's just disregarding some bits.

In either case it doesn't require a divider -- you're simply wiring up

the most significant bits which may or may not involve changing the

apparent amplitude of the result by integer factors of two.

Should you wish to hold the input of the next filter to n bits then yes,

you have to do something with that extra-wide data bus coming out of

preceding filters.

It can/will cause problems with precision, but if you can analyze what

happens inside a filter section you can analyze what happens in between

sections.

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed vs Float ?

Since together with the decision to have float or

fixed, the next question which is at least as

important is how many bits you need. while the

float part takes over the exponent adjustment,

speak the shifting to the left or right, the

number of bits in the mantissa or as fixed

determine the dynamic range of your result.

When the pressure to save some macrocells is

there, then you should have a closer look what

happens when you omit how many bits at what

operation.

Rene

--

Ing.Buero R.Tschaggelar - http://www.ibrtses.com

& commercial newsgroups - http://www.talkto.net

Ing.Buero R.Tschaggelar - http://www.ibrtses.com

& commercial newsgroups - http://www.talkto.net

We've slightly trimmed the long signature. Click to see the full one.

#### Site Timeline

- » FATAL_ERROR while creating a test bench waveform (ISE WebPack 8.1.01i)
- — Next thread in » Field-Programmable Gate Arrays

- » PacoBlaze with multiply and 16-bit add/sub instructions
- — Previous thread in » Field-Programmable Gate Arrays

- » PipelineC - C-like almost hardware description language - AWS F1 Example
- — Newest thread in » Field-Programmable Gate Arrays

- » Hey, I want one of these in my van!
- — The site's Newest Thread. Posted in » Electronics Design

- » New solar panel on nerdiest gadget of all time
- — The site's Last Updated Thread. Posted in » Electronics Repair