how do I minimize the logic in this function?

I have an async function with six bits in and eight bits out (listed below). I need to minimize the logic usage (in a Virtex2) for this function. It appears that most kmap tools will support six bits in but only one bit out. Anyone have a tool or method they would recommend to help with my problem?

I do it currently with two three bit subtracters (in LUTs, not CarryChain) and a 6bit x 16 element mux (using muxf prims) running as a lookup table. The output of one adder drives the mux S input. It just takes too much time and space.

In Out

0 0 1 66 2 128 3 0 4 64 5 65 6 66 7 64 8 130 9 128 10 136 11 130 12 0 13 66 14 128 15 0 16 68 17 69 18 65 19 68 20 80 21 81 22 69 23 80 24 64 25 65 26 66 27 64 28 68 29 69 30 65 31 68 32 138 33 136 34 160 35 138 36 130 37 128 38 136 39 130 40 162 41 160 42 168 43 162 44 138 45 136 46 160 47 138 48 0 49 66 50 128 51 0 52 64 53 65 54 66 55 64 56 130 57 128 58 136 59 130 60 0 61 66 62 128 63 0

Thanks for your time.

Reply to
Brannon
Loading thread data ...

Brannon,

How about a BRAM? Simple ROM look up table, 6 bits in, 8 bits out.

Aust> I have an async function with six bits in and eight bits out (listed

Reply to
Austin Lesea

Reply to
Alex Freed

Okay: lets suppose I use eight ROM64x1 prims. Is that really fewer recourses and/or comparable in speed to a four bit subtracter plugged into the switch 6b x 16el mux with constants on all its data? What resources does a ROM64x1 use?

Reply to
Brannon

Brannon,

I am proposing you load up a BRAM with the values you need for the addresses you have.

A simple large table lookup. Done in one cycle. Uses one BRAM block, but, do you sue them anyway? Do you have a spare one?

Sure it is more real estate than just about every other method, but it is fast, and simple. And if you have any unused BRAMs lying about, it is done.

Aust> Okay: lets suppose I use eight ROM64x1 prims. Is that really fewer

Reply to
Austin Lesea

"one cycle" is the whole issue. I don't have any spare cycles. This has to be done asynchronously.

Reply to
Brannon

So split it into eight functions that each take six bits in and produce one bit out. Then minimize each one separately.

Better yet, just write it in an HDL and let the synthesis tools take care of it. For all but the most timing-critical cases, you'll wind up with perfectly acceptable results.

Reply to
Eric Smith

The ROM64x1 uses 4 16-bit LUTs, two MUXF5s and a MUXF6. You would have 6 bits of address with fanouts of 6-24 with 1 LUT through MUXF5 and MUXF6 as your delay times. The total resources: 16 slices.

Reply to
John_H

Brannon,

Ah. I see. No clock.

Seems strange that somewhere in this whole design there is no clock that tells you when the data is valid, but then, it isn't something I am working on.

Even a strobe that tells you the address is valid could be used to clock the BRAM....

But, if you are doing something totally asynchronous, I will bow out immediately.

Aust> "one cycle" is the whole issue. I don't have any spare cycles. This has

Reply to
austin

Hi -

It's easy to try out. Here's an inelegantly-written Verilog module:

module comb_function ( // Outputs out_val, // Inputs in_val, );

//-----FPGA I/O

output [7:0] out_val; input [5:0] in_val;

reg [7:0] out_val;

always @(in_val) case(in_val) 0: out_val = 0 ; 1: out_val = 66 ; 2: out_val = 128; 3: out_val = 0 ; 4: out_val = 64 ; 5: out_val = 65 ; 6: out_val = 66 ; 7: out_val = 64 ; 8: out_val = 130; 9: out_val = 128; 10: out_val = 136 ; 11: out_val = 130 ; 12: out_val = 0 ; 13: out_val = 66 ; 14: out_val = 128 ; 15: out_val = 0 ; 16: out_val = 68 ; 17: out_val = 69 ; 18: out_val = 65 ; 19: out_val = 68 ; 20: out_val = 80 ; 21: out_val = 81 ; 22: out_val = 69 ; 23: out_val = 80 ; 24: out_val = 64 ; 25: out_val = 65 ; 26: out_val = 66 ; 27: out_val = 64 ; 28: out_val = 68 ; 29: out_val = 69 ; 30: out_val = 65 ; 31: out_val = 68 ; 32: out_val = 138 ; 33: out_val = 136 ; 34: out_val = 160 ; 35: out_val = 138 ; 36: out_val = 130 ; 37: out_val = 128 ; 38: out_val = 136 ; 39: out_val = 130 ; 40: out_val = 162 ; 41: out_val = 160 ; 42: out_val = 168 ; 43: out_val = 162 ; 44: out_val = 138 ; 45: out_val = 136 ; 46: out_val = 160 ; 47: out_val = 138 ; 48: out_val = 0 ; 49: out_val = 66 ; 50: out_val = 128 ; 51: out_val = 0 ; 52: out_val = 64 ; 53: out_val = 65 ; 54: out_val = 66 ; 55: out_val = 64 ; 56: out_val = 130 ; 57: out_val = 128 ; 58: out_val = 136 ; 59: out_val = 130 ; 60: out_val = 0 ; 61: out_val = 66 ; 62: out_val = 128 ; 63: out_val = 0 ; endcase

endmodule

The resource usage is:

Mapping to part: xc2v40fg256-4 LUT2 6 uses LUT3 3 uses LUT4 11 uses

Synplify estimates an in-to-out delay of 2.865ns, NOT including I/O buffers. Note, too, that I used -4, which should be the lowest speed grade.

Seems pretty fast and cheap.

Bob Perlman Cambrian Design Works

On 13 Jan 2006 14:22:20 -0800, "Brannon" wrote:

Reply to
Bob Perlman

Whatever you do, don't ever feed async inputs into a Virtex-II BRAM.

And never clock BRAMs, while enabled, from an unlocked DCM.

See Answer Record 21870 aka "How do I randomly clobber bits in my read only BRAM function"

or the recent thread starting here:

formatting link

Brian

Reply to
Brian Davis

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.