Jeorg's question on sci.electronics.design for an under $2 DSP chip got me to thinking:
How are 1-cycle multipliers implemented in silicon? My understanding is that when you go buy a DSP chip a good part of the real estate is taken up by the multiplier, and this is a good part of the reason that DSPs cost so much. I can't see it being a big gawdaful batch of combinatorial logic that has the multiply rippling through 16 32-bit adders, so I assume there's a big table look up involved, but that's as far as my knowledge extends.
Yet the reason that you go shell out all the $$ for a DSP chip is to get a 1-cycle MAC that you have to bury in a few (or several) tens of cycles worth of housekeeping code to set up the pointers, counters, modes &c -- so you never get to multiply numbers in one cycle, really.
How much less silicon would you use if an n-bit multiplier were implemented as an n-stage pipelined device? If I wanted to implement a
128-tap FIR filter and could live with 160 ticks instead of 140 would the chip be much smaller?Or is the space consumed by the separate data spaces and buses needed to move all the data to and from the MAC? If you pipelined the multiplier _and_ made it a two- or three- cycle MAC (to allow time to shove data around) could you reduce the chip cost much? Would the amount of area savings you get allow you to push the clock up enough to still do audio applications for less money?
Obviously any answers will be useless unless somebody wants to run out and start a chip company, but I'm still curious about it.