assign t = f ? (a | b) : (a & b); assign out = t[s];
Where a and b are 4 bit inputs, f selects a function on them, and t[s] selects one of the 4 result bits.
Looking at the CLB diagram, I was thinking this would fit in 4 LUTs for the logic, followed by a MUXF5 and a MUXF6 for the selection.
Somehow, I end up with 6 LUTs and a MUXF5 (using XST 6.2). Now, I've tried using manual instantiations of the MUXF5/MUXF6. This way I end up with the proper muxes, but they are fed by 1-input LUTs, and the logic is calculated somewhere else.
Does anybody know a way to convince XST to fit this in 4 LUTs/MUXF5/MUXF6 ?
Your code doesn't include the 4-1 mux on the result. Without the mux, even my original code will fit in 4 LUT3's. The problem is XST doesn't see that the mux can be implemented using MUXF5/MUXF6 elements, and instead uses more LUTs.
uups! you right I simplified your code and removed the mux :(
well your code as posted does get Cell Usage : # BELS : 7 # LUT3 : 6 # MUXF5 : 1 hmm.. but this is what I would expect it to be? enterig think mode again...
--------------------------- Selected Device : 3s1000ft256-4 Number of Slices: 2 out of 7680 0% :)
hm.. the F5/F6 muxed versions seems to be both smaller and faster, but no matter the synthesis options XST refuses to use that solution, unless the F5/F6 muxes are directly instantiated!
However, if you try to write vector 't' above with a case or if-else (which I had tried before), then you get 4 more LUT1s. Apparently you have to spell things out really carefully for XST to have it find the optimal solution, especially if you want the MUXF6 in there. Which is too bad, because it tends to make the code much harder to read.
Your correct this can be done. But your coding style isn't exactly conducive to the synthesis of a mux. Case statements are the preferred coding style if you want the MUXF5/MUXF6/MUXF# resources to be used. The dedicated muxes also roughly correlate to a particular width MUXF5 == 4:1, MUXF6 == 8:1. etc.
So if you want to get a MUXF6, at three bit case statement is appropriate:
Note: This was targeted to V-II, but that shouldn't affect the results. ========================================================================= Macro Statistics : # Multiplexers : 1 # 1-bit 8-to-1 multiplexer : 1
Because the particular configuration you are looking for requires that the f input be a "select" in the fist stage (LUT) as opposed to a MUXF5/MUXF6 select, the order of the select bits in the case statement does matter.
Additionally, there are some connectivity restrictions for MUXF#, but I see you already worked that out.
Thanks for the info. I tried the same method on my slightly more complicated design:
module lut_test( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;
reg c;
always @(a or b or f or s) case({s, f}) 4'b0000: c = a[0] & b[0]; 4'b0100: c = a[1] & b[1]; 4'b1000: c = a[2] & b[2]; 4'b1100: c = a[3] & b[3];
4'b0001: c = a[0] | b[0]; 4'b0101: c = a[1] | b[1]; 4'b1001: c = a[2] | b[2]; 4'b1101: c = a[3] | b[3];
4'b0010: c = a[0] ^ b[0]; 4'b0110: c = a[1] ^ b[1]; 4'b1010: c = a[2] ^ b[2]; 4'b1110: c = a[3] ^ b[3];
4'b0011: c = a[0] & ~b[0]; 4'b0111: c = a[1] & ~b[1]; 4'b1011: c = a[2] & ~b[2]; 4'b1111: c = a[3] & ~b[3]; endcase
endmodule
But this resulted in 8 LUTs, 4xMUXF5, 2xMUXF6, and 1xMUXF7. Is there a way to write this so it'll fit in 4 LUTs without resorting to instantiating the MUXF5/MUXF6 manually?
Quote: But this resulted in 8 LUTs, 4xMUXF5, 2xMUXF6, and 1xMUXF7.
This is exactly what you should expect for a 4 bit select vector MUXF6 = 8:1, MUXF7 = 16:1.
If you run my example through the tools you can review the results and see that each (of the 4 LUTs) is already fully unitized with 4 inputs (f, s(0), A#, B#). So you can't get anything more complex, without using more logic.
Your example will underutilize the LUT4s, 3 inputs A#,B#,& f. With the select bits driving MUXF5/MUXF6/MUXF7. So you could make is still more complicated (add a mask bit for example) and not use any additional resources.
conducive :) thanks - well that coding style isnt mine it was from original poster, it did surprise, I have never used that and possible never will. surprising is that it yields to correct synthesis but not using muxF while the other styles will use muxF
I think more in low level terms. started 1979 (or even before) with 7400 "things" today trying to use only minimal set ot vhdl/verilog
If you want a specific implementation, there are usually things about how you code a given function that can help the guide the tools to your intended solution. In this case, without the specific coding style the results are not optimal from an area (resource) standpoint. I will take this up the the synthesis folks.
Back to how do I get MUXF5/MUXF6. This implies eight to one multiplexing, so use a three bit select in the case statement.
This following will produce the LUT4/MUXF5/MUXF6 logic:
module lut_test8( a, b, c, f, s ); input [3:0] a, b; input [1:0] s; input [1:0] f; output c;
reg c;
always @(a or b or f or s) case({s, f[1]}) 4'b000: c = !f[0]? (a[0] & b[0]) : (a[0] | b[0]); 4'b010: c = !f[0]? (a[1] & b[1]) : (a[1] | b[1]); 4'b100: c = !f[0]? (a[2] & b[2]) : (a[2] | b[2]); 4'b110: c = !f[0]? (a[3] & b[3]) : (a[3] | b[3]);
// 4'b0001: c = a[0] | b[0]; // 4'b0101: c = a[1] | b[1]; // 4'b1001: c = a[2] | b[2]; // 4'b1101: c = a[3] | b[3];
Thanks. Restructuring the code like you suggest below may not always be a viable option. For instance, if the logic is in one module, and the 4-1 mux in another (assuming this division makes sense from a design standpoint), then you'd really want the tools to optimize this, rather than rewrite the code in a way that makes it hard to maintain and understand.
I may not always want to rewrite the code like this, but at least it's good to know how it can be done, and perhaps apply this where it doesn't hurt the readability too much and/or if performance is critical.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.