Analysis of vendor's example FIR subroutine

Greetings:

Below are listings of two versions of a 20-tap FIR filter, the first is the vendor's example, and the second is my re-write. I have tried to make sense of the vendor's version in vain and would appreciate your inspection and comments. My version works correctly on the target (with the attached coefficients, implements a low-pass filter about 3db down at 5kHz -- sorry, I haven't worked backward from the coefficients to get the specs).

Target: Mwave CPU, 16 bit registers, 32 bit mult and 32 bit wide accumulators (wr2, wr6), no guard bits, rounding supported. Base/Index registers r0 and r4; r0-r3 in register bank one, r4-r7 in register bank 2, RP is 32 bit product register. ADC/DACs are 16-bit.

Inputs to the filter subroutine are in a large circular buffer mod 256 stereo samples (512 16-bit words).

Assembler syntax and arch desc:

note that 'N' can be an arith expression evaluated by the assembler:

#(N(r0) : immediate offset added to base register r0 N(r0) : relative offset added to base register r0; treat as index+offset addressing compound instruction, e.g.: r1=2*1(r0) wr6=wr6+RP r1*r5 1. fetch from one register bank, accumulate in another register bank, multiply between register banks. This uses the three- stage pipeline and is executed in one cycle. The accumulate is from the previous mult. and the mult is from two previous fetches.

  1. If accumulating, a mult. is always required in the third slot so dummies are sometimes inserted.

Branches execute one instruction following the branch. Subroutines are called by storing the return addr in r7; returns happen by putting r7 on the CDB (common data bus)and branching relative to it.

Comments in the code describe anything omitted here.

Following the logic in the vendor's example, I get cross-accumulations between channels, not all of the samples are processed, and only half of the coefficients (although for a symmetrical filter this is understandable).

Am I hallucinating?

--------------------------------------------------------------------- Vendor's code:

; 20 tap FIR filter ; ; Inputs: ; R0 = pointer to filter coefficients ; R4 = pointer to input samples from FIRBUF ; ; Outputs: ; R2 = filter output (L) ; R6 = filter output (R)

filter section code entry filter extrn coef,coefptr sysref SYSRTN,SYSDSPTR,SYSTMP0

filt1 SYSTMP0=r4 ; save input buf ptr

r4=#(-19*2)(r4) tnop ;Move the input buf pointer back nop ; clear r2 and r6 %wr2 %wr6 r1=2*0(r0) ;First coefficient r5=2*0(r4) ;Oldest sample r5=2*1(r4) r1*r5 ;Next sample r1=2*1(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*2(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*2(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*3(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*3(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*4(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*4(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*5(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*5(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*6(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*6(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*7(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*7(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*8(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*8(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*9(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*9(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*10(r4) wr2=wr2+RP r1*r5 ;Accumulate second output wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*11(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*8(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*12(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*7(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*13(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*6(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*14(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*5(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*15(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*4(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*16(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*3(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*17(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*2(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*18(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*1(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*19(r4) wr2=wr2+RP r1*r5 ;Accumulate second output r1=2*0(r0) wr6=wr6+RP r1*r5 ;Accumulate first output r5=2*20(r4) wr2=wr2+RP r1*r5 ;Accumulate second output wr6=wr6+RP r1*r5 ;Accumulate first output wr2=wr2+RP ;Accumulate second output

r6=r6+rd ;Round the 17th bit into a r2=r2+rd ;a 16 bit result

r7 ; return to calling program b 0(cdb) r4=SYSTMP0 ; restore input buf ptr END

----------------------------------------------------------------------------

My code:

; 20 tap FIR filter ; ; Inputs: ; R0 = pointer to filter coefficients ; R4 = pointer to input samples from FIRBUF ; ; Outputs: ; R2 = filter output (L) ; R6 = filter output (R)

filter section code entry filt1 extrn coef,coefptr sysref SYSRTN,SYSDSPTR,SYSTMP0

filt1 SYSTMP0=r4 ; save input buf ptr

r4=#(-19*4)(r4) tnop ;Move the input buf ptr back 20

- 1 ; stereo samples ;clear wr2 and wr6 %wr2 %wr6 r1=2*0(r0) ;coef0 r5=2*1(r4) ;right0 r5=2*0(r4) r1*r5 ;left0, coef0*right0 r1=2*1(r0) wr6=wr6+RP r1*r5 ;coef1, wr6+=coef0*right0, coef0*left0 r5=2*3(r4) wr2=wr2+RP r1*r5 ;right1, wr2+=coef0*left0, dummy mult r5=2*2(r4) r1*r5 ;left1. coef1*right1 r1=2*2(r0) wr6=wr6+RP r1*r5 ;coef2, wr6+=coef1*right1, coef1*left1 r5=2*5(r4) wr2=wr2+RP r1*r5 ;right2, wr2+=coef1*left1, dummy mult r5=2*4(r4) r1*r5 ; r1=2*3(r0) wr6=wr6+RP r1*r5 ; ... r5=2*7(r4) wr2=wr2+RP r1*r5 ;3 r5=2*6(r4) r1*r5 ; r1=2*4(r0) wr6=wr6+RP r1*r5 ; r5=2*9(r4) wr2=wr2+RP r1*r5 ;4 r5=2*8(r4) r1*r5 ; r1=2*5(r0) wr6=wr6+RP r1*r5 ; r5=2*11(r4) wr2=wr2+RP r1*r5 ;5 r5=2*10(r4) r1*r5 ; r1=2*6(r0) wr6=wr6+RP r1*r5 ; r5=2*13(r4) wr2=wr2+RP r1*r5 ;6 r5=2*12(r4) r1*r5 ; r1=2*7(r0) wr6=wr6+RP r1*r5 ; r5=2*15(r4) wr2=wr2+RP r1*r5 ;7 r5=2*14(r4) r1*r5 ; r1=2*8(r0) wr6=wr6+RP r1*r5 ; r5=2*17(r4) wr2=wr2+RP r1*r5 ;8 r5=2*16(r4) r1*r5 ; r1=2*9(r0) wr6=wr6+RP r1*r5 ; r5=2*19(r4) wr2=wr2+RP r1*r5 ;9 r5=2*18(r4) r1*r5 ; r1=2*10(r0) wr6=wr6+RP r1*r5 ; r5=2*21(r4) wr2=wr2+RP r1*r5 ;10 r5=2*20(r4) r1*r5 ; r1=2*11(r0) wr6=wr6+RP r1*r5 ; r5=2*23(r4) wr2=wr2+RP r1*r5 ;11 r5=2*22(r4) r1*r5 ; r1=2*12(r0) wr6=wr6+RP r1*r5 ; r5=2*25(r4) wr2=wr2+RP r1*r5 ;12 r5=2*24(r4) r1*r5 ; r1=2*13(r0) wr6=wr6+RP r1*r5 ; r5=2*27(r4) wr2=wr2+RP r1*r5 ;13 r5=2*26(r4) r1*r5 ; r1=2*14(r0) wr6=wr6+RP r1*r5 ; r5=2*29(r4) wr2=wr2+RP r1*r5 ;14 r5=2*28(r4) r1*r5 ; r1=2*15(r0) wr6=wr6+RP r1*r5 ; r5=2*31(r4) wr2=wr2+RP r1*r5 ;15 r5=2*30(r4) r1*r5 ; r1=2*16(r0) wr6=wr6+RP r1*r5 ; r5=2*33(r4) wr2=wr2+RP r1*r5 ;16 r5=2*32(r4) r1*r5 ; r1=2*17(r0) wr6=wr6+RP r1*r5 ; r5=2*35(r4) wr2=wr2+RP r1*r5 ;17 r5=2*34(r4) r1*r5 ; r1=2*18(r0) wr6=wr6+RP r1*r5 ; r5=2*37(r4) wr2=wr2+RP r1*r5 ;18 r5=2*36(r4) r1*r5 ; r1=2*19(r0) wr6=wr6+RP r1*r5 ; r5=2*39(r4) wr2=wr2+RP r1*r5 ;right19 r5=2*38(r4) r1*r5 ;left19, coef19*right19 wr6=wr6+RP r1*r5 ;wr6+=coef19+right19, coef19*left19 wr2=wr2+RP ;wr2+=coef19+left19

r6=r6+rd ;Round the 17th bit into a r2=r2+rd ;a 16 bit result

r7 ; return to calling program b 0(cdb) r4=SYSTMP0 ; restore input buf ptr END

----------------------------------------------------------------------------------

Coefficients:

fir20d section data entry coef

; coefficients for 20-tap FIR filter coef equ * dd w -21 dd w -112 dd w -276 dd w -402 dd w -258 dd w 416 dd w 1733 dd w 3502 dd w 5226 dd w 6295 dd w 6295 dd w 5226 dd w 3502 dd w 1733 dd w 416 dd w -258 dd w -402 dd w -276 dd w -112 dd w -21 end

---------------------------------------------------------------------

Regards,

Michael

Reply to
msg
Loading thread data ...

FWIW, I forgot the sampling rate: 44100kHz.

Regards,

Michael

Reply to
msg

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.