The 18-by-18 multiplier is two's-complement. You can use it for unsigned multiplication by setting the MSB of the multiplier and multiplicand inputs to 0. This works as long as the unsigned numbers are 17 bits or less. Beyond that, it's going to cost extra.
What's the deal with context free pointless posts? Sheesh, is there anything more useful they could have done with internet bandwidth? (and spell checked "area")
From your emails and some test that I have done, I can conclude that due to the fact that the 18x18 multipliers are signed multipliers, some considerations have to be kept in mind to synthesize the 'right' number of multipliers.
The following pairs of packages and types work as expected:
use ieee.std_logic_signed.all; type signed
use ieee.std_logic_signed.all; type std_logic_vector
use ieee.numeric_std.all; type signed
imlementing the multiplication of two 18 bits wide operands in just one 1 18x18 multiplier.
However the following combinations
use ieee.numeric_std.all; tpye unsigned
use ieee.std_logic_unsigned.all; type std_logic_vector
are implemented in 3 18x18 multipliers when the operands are 18 bits wide.
I assume synthesis tools are smarter than I am about these things, but I'd have thought an 18x18 unsigned multiplier was a 17x17 unsigned multiplier, 35 AND gates, some wire and a three-input 35-bit-wide adder:
where a and c can only take on the value 0 and 1, so you can multiply by them using an AND gate.
Though the DSP blocks on Xilinx chips seem to have some extra outputs and inputs beyond the multiplier alone, so it may be that cascading DSP blocks, using two of them as things much less complicated than multipliers, saves time if there are no resource constraints.
Does the synthesis do something different if you instantiate N+1 18x18 unsigned multipliers on a chip with 3N hardware multipliers?
Wrap the following up as an entity, and the ugliness of the architecture doesn't matter.
You can make your own multiplier, and break it down into 3 separate multiplications yourself, 17x17, 17x1** and 18x1, onto 3 intermediate signals, and sum them yourself. Just a couple of lines of VHDL (e.g. in a clocked process)
The tools will still infer three multiplication blocks. HOWEVER by attaching attributes to the internal signals:
attribute mult_style: string; attribute mult_style of big_one:signal is "block"; attribute mult_style of little_one:signal is "lut";
you have fine control of multiplier block or gate usage for any multiplier size you care for. Works in Webpack XST, I haven't tried in other tools...
Making this a parameterisable n*m multiplier with n,n\m generics is left as an exercise for the reader...
** Yes, use a 17x1 multiplier! The tools are certainly pretty smart about packing multiplication into LUTs when they are told to. I don't know about a n*1 mult, but I couldn't match its performance on a n*2 mult with anything simple.
Hi Cristian, Just a note that the dedicated multipliers in Cyclone-II, Stratix and Stratix-II (i.e. Embedded multipliers or DSP blocks) can implement full width 18x18 signed *and* unsigned multipliers. Unsigned multipliers are supported explicitly, and are not just a special case of signed multipliers. So for the sample code you gave, it will be implemented in one 18x18 dedicated multiplier.
thanks for your information. Actually, when I was doing my tests I was comparing the new Lattice ECP with the Spartan III. I realized that the same code needed 1 18x18 multiplier in the Lattice, whereas it needed 3 18x18 multipliers in the Spartan III. That was the reason of my question. The sysDSP blocks in the ECP are very similar to the DSP blocks available in the Stratix, but in a very low cost device.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.