I don't know spartan devices. I only work with Virtex-II and virtex-I Pro I have a look on spartan datasheet and according to your requirement I don't see how you can do except if decimation ratio is greater tha
4 (or if you increase clock rate upto (decimation filter input dat rate)*32) I've seen you can use up to 4 multipliers. Why do you want to use onl one
Distributed Arithmetic implementations of filters don't need to use the silicon multipliers provided by the FPGA. With your sample rate being much lower than the FPGA clock rate a DA filter may be what you want. The DA filter processes samples 1(or more) bits per clock cycle. For minimum FPGA logic use you would process 1 bit of the sample word per clock cycle. So in your case, 16 clock cycles per sample would be required so you would need a clock rate of 8.8x16 = 140 MHz. (you can use the spartan 3 DCM frequency synthesizer to get a clock rate around this).
In the DA filter there are LUTs that calculate the partial products for the multiplies. I have seen newsgroup posts describing how one can use SRL16's(xilinx) instead of the LUT. This allows the LUT to be reloaded in a serial fashion if you want to change the coefficients.
Disclaimer: I have implemented a DA filter before but I admit I have not done a decimation filter so I don't know for sure that DA technqiues are appropriate for decimation filters. Most probably there are further logic savings you can achieve because the post decimation sample rate is lower than the incoming sample rate.
Search the newsgroup for distributed arithmetic and you will get heaps of good hits.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.