Before abandoning this thread, I thought I would cobble together a
72
*72->144 multiplier using the 3s500's built-in 18*18->36 bit hardware multipliers. The result is shown below. As you can see, more slices and LUT's are required when using the built-in hardware multipliers than by the same width multiplier written in pure Verilog,
*but* the frequency is also faster.
Device utilization summary: --------------------------- Selected Device : 3s500epq208-4 Number of Slices: 817 out of 4656 17% Number of Slice Flip Flops: 669 out of 9312 7% Number of 4 input LUTs: 1094 out of 9312 11% Number of bonded IOBs: 18 out of 158 11% Number of MULT18X18s: 16 out of 20 80% Number of GCLKs: 1 out of 24 4%
Timing Summary: --------------- Speed Grade: -4 Minimum period: 12.982ns (Maximum Frequency: 77.033MHz) Minimum input arrival time before clock: 10.583ns Maximum output required time after clock: 8.062ns Maximum combinational path delay: No path found
It's a little hard to believe that the circuit design using the builtin multipliers actually requires more FPGA space than doing the whole thing in Verilog, so I've pasted the Verilog source code that I wrote to test the hardware multipliers into this message below. As you can see, the synthesizer automatically chooses to use the builtin MULT18X18s simply because I specified an 18*18 bit multiply in the Verilog code (see m36.v) without requiring me to explicitely instantiate the MULT18X18 multipliers by name.
There are three modules. They are:
- m36.v
- m72.v
- main.v
"main.v" is a simple I/O interface for MX_72 which is the main multiplier. The only reason for having "main" is so that the synthesizer won't complain about running out of I/O pins. Let me know if you're able to cut down on the gate count significantly. I already know I could probably eliminate one or two temporary registers I used in MX_72, but doubt it would make much difference.
Regards,
Ron
//================= m72.v ================= // m72.v, (c) May 9, 2006, Ron Dotson
module MX_72 (reset,clock,a,b, r,finished); parameter N=72; // Bus Width of input parameter L=36; // Half the Bus Width parameter S0=0,S1=1,S2=2,S3=3,S4=4,S5=5,S6=6,S7=7; input reset, clock; input [N-1:0] a,b; output reg [2*N-1:0] r; output reg finished;
reg resetMX; reg [1:0] state; reg [L-1:0] a1,b1, a2,b2; reg [2*N-1:0] t1,t2,t3; wire [N-1:0] r1,r2,r3,r4; wire finished1,finished2,finished3,finished4;
MX_36 m1 (resetMX,clock,a1,b1, r1,finished1); MX_36 m2 (resetMX,clock,a1,b2, r2,finished2); MX_36 m3 (resetMX,clock,a2,b1, r3,finished3); MX_36 m4 (resetMX,clock,a2,b2, r4,finished4);
always @(posedge clock) if (reset==1) begin finished