First the co-efficients, if they are not same then you'll have 600 of them and you need to decide on their precision. that along with your inputs and outputs will determine your memory reqd. Next the adders and multipliers depend on how much parallelism you put into your filtering. that also gives your io requirements. I dont know the logic resource of this particular device but looking at your low sampling rate, you may be fine with pipelined serial implementation.
Given you're using an Actel chip, you may not have looked on xilinx.com, which has an appnote (I forget which one) which describes transposed FIR filters, which can be modified easily to reduce device utilization by exploiting filter symmetry (if any). But anyway, since you have a 10 MHz clock and a 1 kHz data rate, you could get away with using a single multiplier and 4 accumulators. In this case, the resource utilization will be dominated by the multiplier. A more efficient use of resources would be a distributed-arithmetic filter, but resource utilization is not as straightforward to compute.
Make the filter a state machine that acts like a small computer. It fetches the data from a RAM delay line and the coefficient from a ROM and then multiplies them and adds the result to an accumulator, and then goes on like that. That way you reuse the multiplier.
"Mook Johnson" skrev i meddelandet news:GW8Yd.62440$ email@example.com...
You have 10,000 clocks per sample to work with, which gives you 66 clocks per tap for one filter, or 16 clocks per tap per filter for all four, to compute the tap products. This means you don't need to construct a full parallel multiplier. Instead, your filter consists simply of a pair of accumulators. The first one is a scaling accumulator multiplier (see
), which performs a 16xN multiply every 16 clocks. The second one accumulates the products. you have fours sums of products to compute for each incoming sample period. The sample data will need to be serialized. The resulting logic is quite small (about 50 FFs), and will fit in all but the smallest of the MX devices. Arithmetic is awkward in the MX family, but for a 10 MHz clock, shoud not be a problem. The sticking point is going to be memory: you need 600x16 memory for the data delay plus
150x? for each unique set of coefficients. Since there are two unique sets of coefficients, you'll need 1Kx16 of memory. The MX family does not have internal memory, nor do any of the devices have a sufficient number of flip-flops to provide 16K bits of storage. Hopefully, you can add an external SRAM to the design.
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
Have you heard about the FPSLIC? This combines an FPGA + an AVR microcontroller. The smallest part have 20 kB of SRAM and 5 k gates. There ar 36kb + 10kg or 36 kB + 40kg available as well. The SRAM is by 8 only.
You have 150 coefficients and 150 samples Each are 16 bit so you need (150 + 150) * 2 = 600 bytes per filter. Total of 2400 bytes needed for the coefficient and the samples.
You can allocate 4 KB of the SRAM for the FPGA portion
16 or 32 kB allocated for the
You have to do 150 multiply and add per millisecond per filter, so 600 multiply in 10000 clock cycles. This means 16 clocks are available per multiply.
For the smallest FPSLIC part you get 12 x (32 x 4 DPRAMs). There are 4 more, but they are ROM only. You use 10 of them to implement an 40 bit dual port register file.
Implement an 8 bit multiplier in the FPGA and run 4 cycles.per multiply.
The AVR will handle the interrupt and the ADC. Can probably read the ADC using bitbanging to save FPGA real estate. The AVR can write to the FPGA RAM without any problems. The FPGA never writes to the RAM so it will only use the read port.
Think this should be a real nice application for the FPSLIC.