Trying to get 4 LUTs, MUXF5, MUXF6 in Spartan-3

- A
- Artenz
  
  Contact options for registered users
posted
19 years ago

Thu, Dec 9, 2004 9:30 PM

I am trying to combine a 4 bit logic function and a 4-1 mux on the result. For example:

wire [3:0] a; wire [3:0] b; wire f; wire [1:0] s; wire out; wire [3:0] t;

assign t = f ? (a | b) : (a & b); assign out = t[s];

Where a and b are 4 bit inputs, f selects a function on them, and t[s] selects one of the 4 result bits.

Looking at the CLB diagram, I was thinking this would fit in 4 LUTs for the logic, followed by a MUXF5 and a MUXF6 for the selection.

Somehow, I end up with 6 LUTs and a MUXF5 (using XST 6.2). Now, I've tried using manual instantiations of the MUXF5/MUXF6. This way I end up with the proper muxes, but they are fed by 1-input LUTs, and the logic is calculated somewhere else.

Does anybody know a way to convince XST to fit this in 4 LUTs/MUXF5/MUXF6 ?

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 8:31 AM

one word! think! see below

--------------------- module lut_test(a,b,c,f); input f; input [3:0] a; input [3:0] b; output [3:0] c;

wire [3:0] a; wire [3:0] b; wire f; wire [1:0] s; wire out; wire [3:0] t;

//assign t = f ? (a | b) : (a & b); //assign c = t[s]; assign c[0] = (f & (a[0] | b[0])) | (!f & (a[0] & b[0])); assign c[1] = (f & (a[1] | b[1])) | (!f & (a[1] & b[1])); assign c[2] = (f & (a[2] | b[2])) | (!f & (a[2] & b[2])); assign c[3] = (f & (a[3] | b[3])) | (!f & (a[3] & b[3]));

endmodule

-----------------------

Cell Usage : # BELS : 4 # LUT3 : 4 # IO Buffers : 13 # IBUF : 9 # OBUF : 4 =========================================================================

Device utilization summary:

---------------------------

Selected Device : 3s1000ft256-4

Number of Slices: 2 out of 7680 0% Number of 4 input LUTs: 4 out of 15360 0% Number of bonded IOBs: 13 out of 173 7%

=========================================================================

- U
- usenet+5
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 6:37 PM

Your code doesn't include the 4-1 mux on the result. Without the mux, even my original code will fit in 4 LUT3's. The problem is XST doesn't see that the mux can be implemented using MUXF5/MUXF6 elements, and instead uses more LUTs.

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 7:07 PM

[]

uups! you right I simplified your code and removed the mux :(

well your code as posted does get Cell Usage : # BELS : 7 # LUT3 : 6 # MUXF5 : 1 hmm.. but this is what I would expect it to be? enterig think mode again...

----------------------------------------------- module lut_test2(a,b,c,f,s); input f; input [1:0] s; input [3:0] a; input [3:0] b; output c;

wire [3:0] a; wire [3:0] b; wire f; wire [1:0] s; wire [1:0] x;

wire out; wire [3:0] t;

//assign t = f ? (a | b) : (a & b); //assign c = t[s];

assign t[0] = ((f & (a[0] | b[0])) | (!f & (a[0] & b[0]))) & s[0]; assign t[1] = ((f & (a[1] | b[1])) | (!f & (a[1] & b[1]))) & !s[0]; assign t[2] = ((f & (a[2] | b[2])) | (!f & (a[2] & b[2]))) & s[0]; assign t[3] = ((f & (a[3] | b[3])) | (!f & (a[3] & b[3]))) & !s[0];

assign c = s[1] ? (t[3] | t[2]) : (t[1] | t[0]);

endmodule

------------------------------------- the above is functionally same as yours? Cell Usage : # BELS : 6 # LUT4 : 5 # MUXF5 : 1

5 LUTs and 1 MUXF5 is better already :)

antti

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 8:47 PM

it does, see below:

---------------- module lut_mux(a,b,c,f,s); input f; input [1:0] s; input [3:0] a; input [3:0] b; output c;

wire [3:0] a; wire [3:0] b; wire f; wire [1:0] s; wire [1:0] x;

wire out; wire [3:0] t;

assign t[0] = (f & (a[0] | b[0])) | (!f & (a[0] & b[0])); assign t[1] = (f & (a[1] | b[1])) | (!f & (a[1] & b[1])); assign t[2] = (f & (a[2] | b[2])) | (!f & (a[2] & b[2])); assign t[3] = (f & (a[3] | b[3])) | (!f & (a[3] & b[3]));

MUXF5 XLXI_1a (.I0(t[0]), .I1(t[1]), .S(s[0]), .O(x[0])); MUXF5 XLXI_1b (.I0(t[3]), .I1(t[2]), .S(s[0]), .O(x[1])); MUXF6 XLXI_2 (.I0(x[0]), .I1(x[1]), .S(s[1]), .O(c));

endmodule

---------------- Cell Usage : # BELS : 7 # LUT3 : 4 # MUXF5 : 2 # MUXF6 : 1 ========================================================================= Device utilization summary:

--------------------------- Selected Device : 3s1000ft256-4 Number of Slices: 2 out of 7680 0% :)

hm.. the F5/F6 muxed versions seems to be both smaller and faster, but no matter the synthesis options XST refuses to use that solution, unless the F5/F6 muxes are directly instantiated!

good thing to know!

Antti

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 9:52 PM

Thanks!

You can also do even more complicated things and have it fit in the same 4 LUTs, and even make it a little bit more readable.

module lut_test( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;

wire [3:0] t; wire x0, x1;

wire [3:0] m0 = (f == 0) ? ~0 : 0; wire [3:0] m1 = (f == 1) ? ~0 : 0; wire [3:0] m2 = (f == 2) ? ~0 : 0; wire [3:0] m3 = (f == 3) ? ~0 : 0;

assign t = (m0 & (a & b)) | (m1 & (a | b)) | (m2 & (a ^ b)) | (m3 & (a & ~b));

MUXF5 m_1a( .I0(t[0]), .I1(t[1]), .S(s[0]), .O(x0) ); MUXF5 m_1b( .I0(t[2]), .I1(t[3]), .S(s[0]), .O(x1) ); MUXF6 m_2 ( .I0(x0), .I1(x1), .S(s[1]), .O(c) );

endmodule

However, if you try to write vector 't' above with a case or if-else (which I had tried before), then you get 4 more LUT1s. Apparently you have to spell things out really carefully for XST to have it find the optimal solution, especially if you want the MUXF6 in there. Which is too bad, because it tends to make the code much harder to read.

- C
- Chris Ebeling
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 10:09 PM

Annti,

Your correct this can be done. But your coding style isn't exactly conducive to the synthesis of a mux. Case statements are the preferred coding style if you want the MUXF5/MUXF6/MUXF# resources to be used. The dedicated muxes also roughly correlate to a particular width MUXF5 == 4:1, MUXF6 == 8:1. etc.

So if you want to get a MUXF6, at three bit case statement is appropriate:

module lut_test2(a,b,c,f,s); input f; input [1:0] s; input [3:0] a; input [3:0] b; output c;

//Implicit wires exist //wire [3:0] a; //wire [3:0] b; //wire f; //wire [1:0] s;

reg caseout; //wire [2:0] sel = {s,f};

always @(s or f or a or b) case ({s,f}) //f is 0

3'b000 : caseout = (a[0] & b[0]); 3'b010 : caseout = (a[1] & b[1]); 3'b100 : caseout = (a[2] & b[2]); 3'b110 : caseout = (a[3] & b[3]); //f is 1 3'b001 : caseout = (a[0] | b[0]); 3'b011 : caseout = (a[1] | b[1]); 3'b101 : caseout = (a[2] | b[2]); 3'b111 : caseout = (a[3] | b[3]); endcase

assign c = caseout;

endmodule

Note: This was targeted to V-II, but that shouldn't affect the results. ========================================================================= Macro Statistics : # Multiplexers : 1 # 1-bit 8-to-1 multiplexer : 1

Cell Usage : # BELS : 7 # LUT3 : 4 # MUXF5 : 2 # MUXF6 : 1 # IO Buffers : 12 # IBUF : 11 # OBUF : 1 =========================================================================

Because the particular configuration you are looking for requires that the f input be a "select" in the fist stage (LUT) as opposed to a MUXF5/MUXF6 select, the order of the select bits in the case statement does matter.

Additionally, there are some connectivity restrictions for MUXF#, but I see you already worked that out.

Chris

Antti Lukats wrote:

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 10:37 PM

Actually, you can use a case(f) for the 't' vector if you disable automatic mux extraction, resulting in something fairly readable:

module lut_test( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;

reg [3:0] t; wire x0, x1;

// synthesis attribute mux_extract of lut_test is false;

always @(a or b or f ) case(f) 2'd0: t

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 11:02 PM

Chris,

Thanks for the info. I tried the same method on my slightly more complicated design:

module lut_test( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;

reg c;

always @(a or b or f or s) case({s, f}) 4'b0000: c = a[0] & b[0]; 4'b0100: c = a[1] & b[1]; 4'b1000: c = a[2] & b[2]; 4'b1100: c = a[3] & b[3];

4'b0001: c = a[0] | b[0]; 4'b0101: c = a[1] | b[1]; 4'b1001: c = a[2] | b[2]; 4'b1101: c = a[3] | b[3]; 4'b0010: c = a[0] ^ b[0]; 4'b0110: c = a[1] ^ b[1]; 4'b1010: c = a[2] ^ b[2]; 4'b1110: c = a[3] ^ b[3]; 4'b0011: c = a[0] & ~b[0]; 4'b0111: c = a[1] & ~b[1]; 4'b1011: c = a[2] & ~b[2]; 4'b1111: c = a[3] & ~b[3]; endcase

endmodule

But this resulted in 8 LUTs, 4xMUXF5, 2xMUXF6, and 1xMUXF7. Is there a way to write this so it'll fit in 4 LUTs without resorting to instantiating the MUXF5/MUXF6 manually?

- C
- Chris Ebeling
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Dec 11, 2004 1:07 AM

Nope,

Quote: But this resulted in 8 LUTs, 4xMUXF5, 2xMUXF6, and 1xMUXF7.

This is exactly what you should expect for a 4 bit select vector MUXF6 = 8:1, MUXF7 = 16:1.

If you run my example through the tools you can review the results and see that each (of the 4 LUTs) is already fully unitized with 4 inputs (f, s(0), A#, B#). So you can't get anything more complex, without using more logic.

Your example will underutilize the LUT4s, 3 inputs A#,B#,& f. With the select bits driving MUXF5/MUXF6/MUXF7. So you could make is still more complicated (add a mask bit for example) and not use any additional resources.

Chris

Artenz wrote:

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Dec 11, 2004 7:57 AM

conducive :) thanks - well that coding style isnt mine it was from original poster, it did surprise, I have never used that and possible never will. surprising is that it yields to correct synthesis but not using muxF while the other styles will use muxF

I think more in low level terms. started 1979 (or even before) with 7400 "things" today trying to use only minimal set ot vhdl/verilog

antti

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Dec 11, 2004 8:22 AM

You can fit it in 4 LUTs, if you reorder things a bit.

Feed each of the 4 LUTs with (f[0], f[1], A#, B#), and then select the output bit with the MUXF5, MUXF6, feeding them with s[0] and s[1] respectively.

What I originally started with is the following:

module lut_test( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;

reg [3:0] lut_out; reg c;

always @(a or b or f) case(f) 2'b00: lut_out

- C
- Chris Ebeling
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Dec 13, 2004 7:37 PM

If you want a specific implementation, there are usually things about how you code a given function that can help the guide the tools to your intended solution. In this case, without the specific coding style the results are not optimal from an area (resource) standpoint. I will take this up the the synthesis folks.

Back to how do I get MUXF5/MUXF6. This implies eight to one multiplexing, so use a three bit select in the case statement.

This following will produce the LUT4/MUXF5/MUXF6 logic:

module lut_test8( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;

reg c;

always @(a or b or f or s) case({s, f[1]}) 4'b000: c = !f[0]? (a[0] & b[0]) : (a[0] | b[0]); 4'b010: c = !f[0]? (a[1] & b[1]) : (a[1] | b[1]); 4'b100: c = !f[0]? (a[2] & b[2]) : (a[2] | b[2]); 4'b110: c = !f[0]? (a[3] & b[3]) : (a[3] | b[3]);

// 4'b0001: c = a[0] | b[0]; // 4'b0101: c = a[1] | b[1]; // 4'b1001: c = a[2] | b[2]; // 4'b1101: c = a[3] | b[3];

4'b001: c = !f[0]? (a[0] ^ b[0]) : (a[0] & ~b[0]); 4'b011: c = !f[0]? (a[1] ^ b[1]) : (a[1] & ~b[1]); 4'b101: c = !f[0]? (a[2] ^ b[2]) : (a[2] & ~b[2]); 4'b111: c = !f[0]? (a[3] ^ b[3]) : (a[3] & ~b[3]);

// 4'b0011: c = a[0] & ~b[0]; // 4'b0111: c = a[1] & ~b[1]; // 4'b1011: c = a[2] & ~b[2]; // 4'b1111: c = a[3] & ~b[3]; endcase

endmodule

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Dec 14, 2004 6:44 PM

Thanks. Restructuring the code like you suggest below may not always be a viable option. For instance, if the logic is in one module, and the 4-1 mux in another (assuming this division makes sense from a design standpoint), then you'd really want the tools to optimize this, rather than rewrite the code in a way that makes it hard to maintain and understand.

I may not always want to rewrite the code like this, but at least it's good to know how it can be done, and perhaps apply this where it doesn't hurt the readability too much and/or if performance is critical.

Thanks for your help.