Combinatorial logic delay plus routing delay exceeds clock period

Generally speaking, it is not good, if a module's combinatoral logical delay plus the routing delay exceed the clock period.

A certain module calculates a 100-bit accumulation in approxinately 5ns (in a Virtex 5LXT-1). However, the design clock period is 4ns.

I could break the sum logic into two pieces, register the high-50-bits, adding the low-50-bits into a register, adding the registered high-50-bits with carry from the low-50-bits, and finally concatenate the high-50-bit sum with the registered low-50-bit registered sum. That would satifiy timing requirements. Howerver, the extra 100-bits of register resources does not seem offer a benefit and likely increases the likelyhood that a larger device will be required. There could be hundreds of these 100-bit adders in the design.

So, the module calculates the 100-bit accumulation in one go, taking 5ns and

100 less registers than the preceeding example. Obviously, the input terms must remain stable for two clock periods for the sum to be valid and this is accounted for in the upper logic layers.

My issue is how to tell the ISE tool-chain that the 5ns total delay is acceptable in these modules. The Constraints Guide may cover this issue, but I do not see it. Can someone give me an example of the appropriate constraint usage?

Thanks in advance.

- Sam

Reply to
Sam Worth
Loading thread data ...

ay

n

um

n

nd

is

ut

I would create two accumulators and drive their Enable inputs at half frequency, in counterphase. Peter Alfke

Reply to
Peter Alfke

Hi Sam,

Read the constraints guide. Also, Google this:- multi cycle path

NET "clock_en" TNM=FFS "clock_enable"; TIMESPEC TS1 = FROM : clock_enable : TO : clock_enable : 8ns ;

HTH., Syms.

Reply to
Symon

I believe there have been designs that depended on the delay being long enough.

I would say that 5ns is too close, and for FPGA that you can't predict the times that well, anyway. There are many stories about Cray and his computers related to logic delay. (There is also the Cray-2 resonant box story, where to accomplish a 4ns clock cycle they designed a box resonant at 250MHz. Unfortunately, the clock cycle ended up being 4.2ns, and much delay in product release.)

FPGAs usually have so many FF's that registers are free. Without knowing anything else about the design, though, pipelining is usually a good thing. The question always is where to put the pipeline registers for best effect.

Is there no other loss in reducing the clock rate for that part of the design?

-- glen

Reply to
glen herrmannsfeldt

That sounds interesting and something I would like to learn about. Do you have an example of such technique?

- Sam

Reply to
Sam Worth

Follow the red wire here for two register banks on opposite enable phases:

formatting link

-- Mike Treseler

Reply to
Mike Treseler

sorry. Make that:

formatting link

-- Mike Treseler

Reply to
Mike Treseler

Thank, Symon. That did the trick.

- Sam

Reply to
Sam Worth

Sam, here is an even simpler solution that works if you accumulate for many clock ticks and can sacrifice two or three clock ticks before you get the result. You just divide the long accumulator into 2, 3, or 4 parts, with a single carry flip-flop between (you thus pipeline the carry signal) Then, at the end, you use 1, 2, or 3 clock ticks to flush the carry through the accumulator. It costs you no additional hardware at all, (Virtex-5 has the pipeline flip-flop built-in) and it runs as fast as a short accumulator. You pay with the latency at the end. There is no free lunch... Peter Alfke

Reply to
Peter Alfke

Thanks, Peter. The added latency is fine. I am already fine with 2 ticks as it is. But, I do not understand what you mean by, "flush the carry through the accumulator". Is there an HDL example you can refer to?

Thanks in advance.

- Sam

Reply to
Sam Worth

horude

Reply to
Peter Alfke

Hello, Peter. I am not familiar with that word. What does it mean?.

- Sam

Reply to
Sam Worth

Sorry, that was a silly security code that I was asked to enter . Then I gave up... You do not need any HDL code, you need a basic understanding.

The problem with a long accumulator is the carry ripple delay. That's why I cut up the accumulator into smaller chunks. Into the cut I insert a pipeline flip-flop that delays the rippling carry by one clock tick. Whenever there is a 1 in any of these pipeline registers, it represents an "undigested" carry, which means that the accumulator output is incorrect. Obviously, I must get rid of these stored carry bits before I look at the accumulator output. So I run a few cycles of dummy accumulation, with the data inputs at zero or disabled. Each clock tick will move the stored carry into the accumulator. Worst- case I need one clock tick for every carry-pipeline flip-flop. If you divide the accumulator in two halves, you need only one extra clock tick, and you need it only for read-out, not for the normal accumulator operation.

Think about it. It's really quite simple. Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

Hello, Peter. Thanks for the expanded explanation. That is much clearer. I like this approach. The extra carry accumulation cycles are only needed when carry is generated. When no carry is generated, the answer is immediately available. Nice. Fast (most of the time in my design). Elegant. Thanks again.

- Sam P.S. I often ask for HDL examples from others so I can learn how others have solved similar issues.

Reply to
Sam Worth

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.