How do I optimize filter coefficient bit length and signal bit length?

Hello all

I have made an 8 channel 500kHz low pass IIR-filter in VHDL. The filter uses

32 bits for it's coefficients and 32 bits for it's internal signals.

The filter doesn't give the same DC-gain for small vs. large input signals. I suspect the internal truncation of the intermediate sums and states effects this.

But today I thought about increasing the bits for the signal and decreasing the bits for the coefficients. I tried it out and the filter gave better gain over different input signal levels.

Now I wonder how I should optimize the coefficient and signal bit lengths to get the best result?

Reply to
From Sweden
Loading thread data ...

32 bits oughta be enough for nearly any application. a quantization error of 1 part outa 4 bizillion? i mean, holy crap!!!

the consequences of quantizing coefficients is different from quantizing the signal (or some internal intermediate signal).

quantizing coefficients means that the filter you get is not precisely the one that you designed. the poles and zeros didn't go exactly to where you wanted them to go. but with 32-bits it should easily be close enough. how the coefs map to the poles and zeros depends on the filter topology. what topology are you using? Direct Form 1 or Direct Form 2 or Lattice or Normalized Ladder or some other? (i think there is a Gold-Rader form, there's a bunch of them, some of which have an internal All-pass filter that the rest of the thing is built around. i am a partisan for the Direct Form 1 in fixed-point applications.) what you do, is solve for the pole and zero loci as a function of the coefs (that get quantized) and see what effect the coef quantization has on the pole/zero locus. but dividing each of two dimensions of the unit circle up into 4 bizillion slices should be more than good enough.

consequences of quantizing the signal can range from a additive noise model (if the signal amplitude is much, much larger than the quantization level) to all sorts of nasties (harmonic distortion, limit cycles). triangular PDF additive dither of 2 LSB amplitude is sufficient to get rid of that stuff. i would think that at 32 bits, simple 1st-order noise shaping (with a zero at DC) would suffice if you got 32 bit words (no dither necessary). this particular error or noise shaping requires one extra state in the DF1 and has been called "fraction saving" and Randy Yates has written about it recently in the IEEE Sig Proc magazine.

really, 32 bit words oughta be good enough.

r b-j

Reply to
robert bristow-johnson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.