V6 BUFR -> BUFG clocking structure (hold issue?)

(snip, someone wrote)

(snip)

It seems to me that they do pretty well.

Well, the effects of voltage and temperature should be pretty much the same for all transistors on a chip. But process variations could be very different.

They verify that the usual paths have delay variations that they can account for, and compute delays based on those. If there are some that they can't account for the delays, at least not to the accuracy required, then they don't guarantee those.

As far as I understand, though mostly in general, the idea is to make clock skew in a clock tree small enough, relative to the minimum delay through routing, that two FFs clocked off the same clock can't violate hold time. The skew also must be added to the delay when verifying setup time.

But that only works within one clock tree. Computing the variation between two clock trees is different.

Now, it would be nice to say that some delay is not characterized enough to use, and so far I haven't seen that they do say that, but it isn't the tools' fault if the data isn't available.

-- glen

Reply to
glen herrmannsfeldt
Loading thread data ...

Sorry I missed that earlier. You seem to be mixing up skew and delay.

That figure is the BUFG skew, not the BUFG delay. It represents the worst case timing difference between outputs on the same BUFG. It isn't relevant to your problem.

The "Clock Path Skew" is the important figure. It is the difference between the time of arrival of the clock at the source (clocked from BUFR) flip flops and destination (clocked from BUFG) flip flops. In this case it is mostly made up of the BUFG delay.

PAR has to include a routing delay to compensate for that skew.

An earlier comment:

It might not meet setup for half a clock cycle, but it doesn't have to! The skew works in your favour when using opposite edges and the requirement for setup time is half a clock cycle + 1.851ns. Unless you have a GHz clock that doesn't sound too hard.

Regards, Allan

Reply to
Allan Herriman

Glen,

All three (voltage, temperature and process) vary over a single die, but no t by much. The trick is always "by how much?" Are we willing to live with s lower guaranteed performance in order to simplify the analysis, or is it wo rth it to invest more in the analysis (NRE) to "speed up" the parts (recurr ing profit)?

Managing hold time is a lot more complicated than it used to be. In the pas t, the clock skew could always be less than Tco plus minimum routing by des ign, so they did not even spec hold time for the registers. Over time, the raw speed of the devices has out-stripped the skew of the clock tree, and h old time is a real problem that has to be taken care of in placement and ro uting. We users just don't have control over the clock tree itself to deal with the problem, like in other domains.

Andy

Reply to
jonesandy

I do rememeber specs. of 0ns hold time. Hold time can even go negative in some cases. I think I remember some TTL parts with negative hold time, but that is some years ago.

Xilinx used to publish actual books about their parts. We could read about them, understand them, and use them appropriately. Yes, I am remembering from some generations ago.

-- glen

Reply to
glen herrmannsfeldt

I don't think I am mixing skew w/ delay.

From DS152 (Virtex 6 AC-DC):

Table 59: Global Clock Switching Characteristics (Including BUFGCTRL)

TBCCKO_O(2) BUFGCTRL delay from I0/I1 to O 0.07 0.08 0.10 0.10 ns

From v6 speedprint:

BUFG Tbgcko_O (33/35) (66/70)

BUFGCTRL

Tbccko_O (33/35) (66/70)

So the delay through BUFGCTRL is small. However, the delay is much bigger s ince it includes the routing. Xilinx is not verbose with the clock path, or at least I don't know how to generate it....

For timingan report all I get is:

Clock Path Skew: 1.851ns (2.677 - 0.826)

Yes, it is skew; it is bigger than usual because the 1st clock (0.826ns ins ertion delay) is driven by the BUFR and the 2nd clock, the target, (2.677ns insertion delay) is driven by the BUFG fed by BUFR. The delta, 1.851ns, is much higher than the prop delay through the buffer - my interpretation is

0.1ns in buffer, rest in routing and/or distribution (both to and from BUFG ); we don't see netlists. I don't think Xilinx has much info about that clo ck routing available for general public.
--
mmihai
Reply to
mmihai

I would not say 'pretty well'; they're not bad but not very good either. Otherwise I would not have problems on the hardware when I'm getting >0.0ns hold slack. And form this thread I would say I am not alone seeing this pr oblem and it is not happening for only one tool version/one chip.

Huh? Numbers for STA should cover PVT. That 'P' stands for process. I am no t sure what is your idea: numbers could be wrong because the process has va riations?

I would like to think they're extracting/characterizing all the delays invo lved in their fabric.... otherwise nobody would use this devices, they won' t work in a reliable fashion.

For a successful STA you need good delay extraction and good algorithm for design/constrains understanding and path computation. In this case both ext raction/delay computation and timing analysis tools are from Xilinx. In my case it looks the delays might be off.... but since the delay is from Xilin x (and you have no 2nd choice) I'll still call it bad STA on their flow....

--
mmihai
Reply to
mmihai

I was starting to review that this weekend ...

Could be the next logic level was had the issue.... because I ended with half clock cycle for the next stage. That should not be a problem though, I can add a new set of flops to realign to the proper edge w/o any logic in between.

This is the path I am exploring right now.

--
mmihai
Reply to
mmihai

Yes, I see those figures in the datasheet. They don't make much sense to me though - I'm fairly sure the actual delay through the BUFG is much larger than 0.10 ns worst case. Your STA results seems to be in agreement with me.

This might be one of those cases where the datasheet timing model doesn't match reality. Total delay through the routing to the BUFG plus the BUFGMUX logic plus the distribution tree itself plus the routing out of the BUFG comes to 1.851ns. Since those figures can't really be separated (in that only their sum matters) Xilinx can assign any figure it wants to some internal part that gets published in the datasheet.

All of this is speculation on my part, of course. It's unfortunate that knowlegable Xilinx staff don't contribute in this newsgroup. You could ask the same question on the Xilinx forums, but I find it's unusual to get a good answer there.

Regards, Allan

Reply to
Allan Herriman

I think we are on the same page here. I've just wanted to point I don't mix the skew with the delay :) I don't expect the clock tree to have a single big buffer (i.e. one gate) that drives it. I think the number for the datasheet is only one (input) gate form the clocktree, the following drivers & routing are lumped in the delay number reported in timingan.

I do agree with you on this one too :)

I've copied the head of this thread on Xilinx forums... no reply till now.

--
mmihai
Reply to
mmihai

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.