Greetings:
Below are listings of two versions of a 20-tap FIR filter, the first is the vendor's example, and the second is my re-write. I have tried to make sense of the vendor's version in vain and would appreciate your inspection and comments. My version works correctly on the target (with the attached coefficients, implements a low-pass filter about 3db down at 5kHz -- sorry, I haven't worked backward from the coefficients to get the specs).
Target: Mwave CPU, 16 bit registers, 32 bit mult and 32 bit wide accumulators (wr2, wr6), no guard bits, rounding supported. Base/Index registers r0 and r4; r0-r3 in register bank one, r4-r7 in register bank 2, RP is 32 bit product register. ADC/DACs are 16-bit.
Inputs to the filter subroutine are in a large circular buffer mod 256 stereo samples (512 16-bit words).
Assembler syntax and arch desc:
note that 'N' can be an arith expression evaluated by the assembler:
#(N(r0) : immediate offset added to base register r0 N(r0) : relative offset added to base register r0; treat as index+offset addressing compound instruction, e.g.: r1=2*1(r0) wr6=wr6+RP r1*r5 1. fetch from one register bank, accumulate in another register bank, multiply between register banks. This uses the three- stage pipeline and is executed in one cycle. The accumulate is from the previous mult. and the mult is from two previous fetches.
- If accumulating, a mult. is always required in the third slot so dummies are sometimes inserted.
Branches execute one instruction following the branch. Subroutines are called by storing the return addr in r7; returns happen by putting r7 on the CDB (common data bus)and branching relative to it.
Comments in the code describe anything omitted here.
Following the logic in the vendor's example, I get cross-accumulations between channels, not all of the samples are processed, and only half of the coefficients (although for a symmetrical filter this is understandable).
Am I hallucinating?
--------------------------------------------------------------------- Vendor's code:
; 20 tap FIR filter ; ; Inputs: ; R0 = pointer to filter coefficients ; R4 = pointer to input samples from FIRBUF ; ; Outputs: ; R2 = filter output (L) ; R6 = filter output (R)
filter section code entry filter extrn coef,coefptr sysref SYSRTN,SYSDSPTR,SYSTMP0
filt1 SYSTMP0=r4 ; save input buf ptr
r4=#(-19*2)(r4) tnop ;Move the input buf pointer back nop ; clear r2 and r6 %wr2 %wr6 r1=2*0(r0) ;First coefficient r5=2*0(r4) ;Oldest sample r5=2*1(r4) r1*r5 ;Next sample r1=2*1(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*2(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*2(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*3(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*3(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*4(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*4(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*5(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*5(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*6(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*6(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*7(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*7(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*8(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*8(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*9(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*9(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*10(r4) wr2=wr2+RP r1*r5 ;Accumulate second output wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*11(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*8(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*12(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*7(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*13(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*6(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*14(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*5(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*15(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*4(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*16(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*3(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*17(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*2(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*18(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*1(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*19(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*0(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*20(r4) wr2=wr2+RP r1*r5 ;Accumulate second output wr6=wr6+RP r1*r5 ;Accumulate first output wr2=wr2+RP ;Accumulate second output
r6=r6+rd ;Round the 17th bit into a r2=r2+rd ;a 16 bit result
r7 ; return to calling program b 0(cdb) r4=SYSTMP0 ; restore input buf ptr END
----------------------------------------------------------------------------
My code:
; 20 tap FIR filter ; ; Inputs: ; R0 = pointer to filter coefficients ; R4 = pointer to input samples from FIRBUF ; ; Outputs: ; R2 = filter output (L) ; R6 = filter output (R)
filter section code entry filt1 extrn coef,coefptr sysref SYSRTN,SYSDSPTR,SYSTMP0
filt1 SYSTMP0=r4 ; save input buf ptr
r4=#(-19*4)(r4) tnop ;Move the input buf ptr back 20
- 1 ; stereo samples ;clear wr2 and wr6 %wr2 %wr6 r1=2*0(r0) ;coef0 r5=2*1(r4) ;right0 r5=2*0(r4) r1*r5 ;left0, coef0*right0 r1=2*1(r0) wr6=wr6+RP r1*r5 ;coef1, wr6+=coef0*right0, coef0*left0 r5=2*3(r4) wr2=wr2+RP r1*r5 ;right1, wr2+=coef0*left0, dummy mult r5=2*2(r4) r1*r5 ;left1. coef1*right1 r1=2*2(r0) wr6=wr6+RP r1*r5 ;coef2, wr6+=coef1*right1, coef1*left1 r5=2*5(r4) wr2=wr2+RP r1*r5 ;right2, wr2+=coef1*left1, dummy mult r5=2*4(r4) r1*r5 ; r1=2*3(r0) wr6=wr6+RP r1*r5 ; ... r5=2*7(r4) wr2=wr2+RP r1*r5 ;3 r5=2*6(r4) r1*r5 ; r1=2*4(r0) wr6=wr6+RP r1*r5 ; r5=2*9(r4) wr2=wr2+RP r1*r5 ;4 r5=2*8(r4) r1*r5 ; r1=2*5(r0) wr6=wr6+RP r1*r5 ; r5=2*11(r4) wr2=wr2+RP r1*r5 ;5 r5=2*10(r4) r1*r5 ; r1=2*6(r0) wr6=wr6+RP r1*r5 ; r5=2*13(r4) wr2=wr2+RP r1*r5 ;6 r5=2*12(r4) r1*r5 ; r1=2*7(r0) wr6=wr6+RP r1*r5 ; r5=2*15(r4) wr2=wr2+RP r1*r5 ;7 r5=2*14(r4) r1*r5 ; r1=2*8(r0) wr6=wr6+RP r1*r5 ; r5=2*17(r4) wr2=wr2+RP r1*r5 ;8 r5=2*16(r4) r1*r5 ; r1=2*9(r0) wr6=wr6+RP r1*r5 ; r5=2*19(r4) wr2=wr2+RP r1*r5 ;9 r5=2*18(r4) r1*r5 ; r1=2*10(r0) wr6=wr6+RP r1*r5 ; r5=2*21(r4) wr2=wr2+RP r1*r5 ;10 r5=2*20(r4) r1*r5 ; r1=2*11(r0) wr6=wr6+RP r1*r5 ; r5=2*23(r4) wr2=wr2+RP r1*r5 ;11 r5=2*22(r4) r1*r5 ; r1=2*12(r0) wr6=wr6+RP r1*r5 ; r5=2*25(r4) wr2=wr2+RP r1*r5 ;12 r5=2*24(r4) r1*r5 ; r1=2*13(r0) wr6=wr6+RP r1*r5 ; r5=2*27(r4) wr2=wr2+RP r1*r5 ;13 r5=2*26(r4) r1*r5 ; r1=2*14(r0) wr6=wr6+RP r1*r5 ; r5=2*29(r4) wr2=wr2+RP r1*r5 ;14 r5=2*28(r4) r1*r5 ; r1=2*15(r0) wr6=wr6+RP r1*r5 ; r5=2*31(r4) wr2=wr2+RP r1*r5 ;15 r5=2*30(r4) r1*r5 ; r1=2*16(r0) wr6=wr6+RP r1*r5 ; r5=2*33(r4) wr2=wr2+RP r1*r5 ;16 r5=2*32(r4) r1*r5 ; r1=2*17(r0) wr6=wr6+RP r1*r5 ; r5=2*35(r4) wr2=wr2+RP r1*r5 ;17 r5=2*34(r4) r1*r5 ; r1=2*18(r0) wr6=wr6+RP r1*r5 ; r5=2*37(r4) wr2=wr2+RP r1*r5 ;18 r5=2*36(r4) r1*r5 ; r1=2*19(r0) wr6=wr6+RP r1*r5 ; r5=2*39(r4) wr2=wr2+RP r1*r5 ;right19 r5=2*38(r4) r1*r5 ;left19, coef19*right19 wr6=wr6+RP r1*r5 ;wr6+=coef19+right19, coef19*left19 wr2=wr2+RP ;wr2+=coef19+left19
r6=r6+rd ;Round the 17th bit into a r2=r2+rd ;a 16 bit result
r7 ; return to calling program b 0(cdb) r4=SYSTMP0 ; restore input buf ptr END
----------------------------------------------------------------------------------
Coefficients:
fir20d section data entry coef
; coefficients for 20-tap FIR filter coef equ * dd w -21 dd w -112 dd w -276 dd w -402 dd w -258 dd w 416 dd w 1733 dd w 3502 dd w 5226 dd w 6295 dd w 6295 dd w 5226 dd w 3502 dd w 1733 dd w 416 dd w -258 dd w -402 dd w -276 dd w -112 dd w -21 end
---------------------------------------------------------------------
Regards,
Michael