Analysis of vendor's example FIR subroutine

Greetings:

Below are listings of two versions of a 20-tap FIR filter, the first is the vendor's example, and the second is my re-write. I have tried to make sense of the vendor's version in vain and would appreciate your inspection and comments. My version works correctly on the target (with the attached coefficients, implements a low-pass filter about 3db down at 5kHz -- sorry, I haven't worked backward from the coefficients to get the specs).

Target: Mwave CPU, 16 bit registers, 32 bit mult and 32 bit wide accumulators (wr2, wr6), no guard bits, rounding supported. Base/Index registers r0 and r4; r0-r3 in register bank one, r4-r7 in register bank 2, RP is 32 bit product register. ADC/DACs are 16-bit.

Inputs to the filter subroutine are in a large circular buffer mod 256 stereo samples (512 16-bit words).

Assembler syntax and arch desc:

note that 'N' can be an arith expression evaluated by the assembler:

#(N(r0) : immediate offset added to base register r0 N(r0) : relative offset added to base register r0; treat as index+offset addressing compound instruction, e.g.: r1=2*1(r0) wr6=wr6+RP r1*r5 1. fetch from one register bank, accumulate in another register bank, multiply between register banks. This uses the three- stage pipeline and is executed in one cycle. The accumulate is from the previous mult. and the mult is from two previous fetches.

If accumulating, a mult. is always required in the third slot so dummies are sometimes inserted.

Branches execute one instruction following the branch. Subroutines are called by storing the return addr in r7; returns happen by putting r7 on the CDB (common data bus)and branching relative to it.

Comments in the code describe anything omitted here.

Following the logic in the vendor's example, I get cross-accumulations between channels, not all of the samples are processed, and only half of the coefficients (although for a symmetrical filter this is understandable).

Am I hallucinating?

--------------------------------------------------------------------- Vendor's code:

; 20 tap FIR filter ; ; Inputs: ; R0 = pointer to filter coefficients ; R4 = pointer to input samples from FIRBUF ; ; Outputs: ; R2 = filter output (L) ; R6 = filter output (R)

filter section code entry filter extrn coef,coefptr sysref SYSRTN,SYSDSPTR,SYSTMP0

filt1 SYSTMP0=r4 ; save input buf ptr

r4=#(-19*2)(r4) tnop ;Move the input buf pointer back nop ; clear r2 and r6 %wr2 %wr6 r1=2*0(r0) ;First coefficient r5=2*0(r4) ;Oldest sample r5=2*1(r4) r1*r5 ;Next sample r1=2*1(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*2(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*2(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*3(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*3(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*4(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*4(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*5(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*5(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*6(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*6(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*7(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*7(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*8(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*8(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*9(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*9(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*10(r4) wr2=wr2+RP r1*r5 ;Accumulate second output wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*11(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*8(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*12(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*7(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*13(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*6(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*14(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*5(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*15(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*4(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*16(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*3(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*17(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*2(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*18(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*1(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*19(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*0(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*20(r4) wr2=wr2+RP r1*r5 ;Accumulate second output wr6=wr6+RP r1*r5 ;Accumulate first output wr2=wr2+RP ;Accumulate second output

r6=r6+rd ;Round the 17th bit into a r2=r2+rd ;a 16 bit result

r7 ; return to calling program b 0(cdb) r4=SYSTMP0 ; restore input buf ptr END

----------------------------------------------------------------------------

My code:

; 20 tap FIR filter ; ; Inputs: ; R0 = pointer to filter coefficients ; R4 = pointer to input samples from FIRBUF ; ; Outputs: ; R2 = filter output (L) ; R6 = filter output (R)

filter section code entry filt1 extrn coef,coefptr sysref SYSRTN,SYSDSPTR,SYSTMP0

filt1 SYSTMP0=r4 ; save input buf ptr

r4=#(-19*4)(r4) tnop ;Move the input buf ptr back 20

- 1 ; stereo samples ;clear wr2 and wr6 %wr2 %wr6 r1=2*0(r0) ;coef0 r5=2*1(r4) ;right0 r5=2*0(r4) r1*r5 ;left0, coef0*right0 r1=2*1(r0) wr6=wr6+RP r1*r5 ;coef1, wr6+=coef0*right0, coef0*left0 r5=2*3(r4) wr2=wr2+RP r1*r5 ;right1, wr2+=coef0*left0, dummy mult r5=2*2(r4) r1*r5 ;left1. coef1*right1 r1=2*2(r0) wr6=wr6+RP r1*r5 ;coef2, wr6+=coef1*right1, coef1*left1 r5=2*5(r4) wr2=wr2+RP r1*r5 ;right2, wr2+=coef1*left1, dummy mult r5=2*4(r4) r1*r5 ; r1=2*3(r0) wr6=wr6+RP r1*r5 ; ... r5=2*7(r4) wr2=wr2+RP r1*r5 ;3 r5=2*6(r4) r1*r5 ; r1=2*4(r0) wr6=wr6+RP r1*r5 ; r5=2*9(r4) wr2=wr2+RP r1*r5 ;4 r5=2*8(r4) r1*r5 ; r1=2*5(r0) wr6=wr6+RP r1*r5 ; r5=2*11(r4) wr2=wr2+RP r1*r5 ;5 r5=2*10(r4) r1*r5 ; r1=2*6(r0) wr6=wr6+RP r1*r5 ; r5=2*13(r4) wr2=wr2+RP r1*r5 ;6 r5=2*12(r4) r1*r5 ; r1=2*7(r0) wr6=wr6+RP r1*r5 ; r5=2*15(r4) wr2=wr2+RP r1*r5 ;7 r5=2*14(r4) r1*r5 ; r1=2*8(r0) wr6=wr6+RP r1*r5 ; r5=2*17(r4) wr2=wr2+RP r1*r5 ;8 r5=2*16(r4) r1*r5 ; r1=2*9(r0) wr6=wr6+RP r1*r5 ; r5=2*19(r4) wr2=wr2+RP r1*r5 ;9 r5=2*18(r4) r1*r5 ; r1=2*10(r0) wr6=wr6+RP r1*r5 ; r5=2*21(r4) wr2=wr2+RP r1*r5 ;10 r5=2*20(r4) r1*r5 ; r1=2*11(r0) wr6=wr6+RP r1*r5 ; r5=2*23(r4) wr2=wr2+RP r1*r5 ;11 r5=2*22(r4) r1*r5 ; r1=2*12(r0) wr6=wr6+RP r1*r5 ; r5=2*25(r4) wr2=wr2+RP r1*r5 ;12 r5=2*24(r4) r1*r5 ; r1=2*13(r0) wr6=wr6+RP r1*r5 ; r5=2*27(r4) wr2=wr2+RP r1*r5 ;13 r5=2*26(r4) r1*r5 ; r1=2*14(r0) wr6=wr6+RP r1*r5 ; r5=2*29(r4) wr2=wr2+RP r1*r5 ;14 r5=2*28(r4) r1*r5 ; r1=2*15(r0) wr6=wr6+RP r1*r5 ; r5=2*31(r4) wr2=wr2+RP r1*r5 ;15 r5=2*30(r4) r1*r5 ; r1=2*16(r0) wr6=wr6+RP r1*r5 ; r5=2*33(r4) wr2=wr2+RP r1*r5 ;16 r5=2*32(r4) r1*r5 ; r1=2*17(r0) wr6=wr6+RP r1*r5 ; r5=2*35(r4) wr2=wr2+RP r1*r5 ;17 r5=2*34(r4) r1*r5 ; r1=2*18(r0) wr6=wr6+RP r1*r5 ; r5=2*37(r4) wr2=wr2+RP r1*r5 ;18 r5=2*36(r4) r1*r5 ; r1=2*19(r0) wr6=wr6+RP r1*r5 ; r5=2*39(r4) wr2=wr2+RP r1*r5 ;right19 r5=2*38(r4) r1*r5 ;left19, coef19*right19 wr6=wr6+RP r1*r5 ;wr6+=coef19+right19, coef19*left19 wr2=wr2+RP ;wr2+=coef19+left19

r6=r6+rd ;Round the 17th bit into a r2=r2+rd ;a 16 bit result

r7 ; return to calling program b 0(cdb) r4=SYSTMP0 ; restore input buf ptr END

----------------------------------------------------------------------------------

Coefficients:

fir20d section data entry coef

; coefficients for 20-tap FIR filter coef equ * dd w -21 dd w -112 dd w -276 dd w -402 dd w -258 dd w 416 dd w 1733 dd w 3502 dd w 5226 dd w 6295 dd w 6295 dd w 5226 dd w 3502 dd w 1733 dd w 416 dd w -258 dd w -402 dd w -276 dd w -112 dd w -21 end

---------------------------------------------------------------------

Regards,

Michael

Analysis of vendor's example FIR subroutine

Join the Discussion

Didn't find your answer?