V6 BUFR -> BUFG clocking structure (hold issue?)

- M
- mmihai
  
  Contact options for registered users
posted
11 years ago

Fri, Nov 30, 2012 12:35 AM

Hi!

I have a Xilinx webcase for about 2mo about this that goes nowhere ... may be better luck here.

My problem: - V6 design - clocking structure with a IBUF to BUFR which drives a BUFG, so both BUFR/BUFG are on the same clock domain - the BUFR also clocks few flops - BUFG clocks main logic - par finishes w/o hold errs - I can detect data transfer errors between the flops clocked by BUFR and the flops clocked by BUFG (direction is data from BUFR flops -> BUFG flops, no logic, just data transfer). - timingan reports no hold errs on those paths - different runs (different placement) will produce a full working design [- ISE 13.4... but it should not matter]

Anyone seen this? Any feedback about this structure?

Goal is to be able to produce predictable results... Now I have no way to do that unless I try it on HW ... but my confidence level is low (i.e. if it works on one device will it work on //all//?).

--
Thanks, 
mmihai

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 3:32 AM

Maybe I am missing something, but unless you tell the timing analysis the relative timing of the two clocks, it can't do setup/hold analysis on them.

Is it supposed to follow the timing through that combination?

-- glen

- M
- mmihai
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 5:03 AM

Both clocks are internal; no extra timing is required. Only the Xilinx's tools know the relative timing since it's the delay only through the FPGA itself.

a) BUFR output is the input of BUFG; same clock domain. b) the tool is propagating the clock through the design and is aware of the propagation delay through buffers (and routing?)

I can ask timingan to report the paths between the flops and nothing is obviously wrong and hold slack is >=0.

Unfortunately it is not that verbose on the clock propagation time but the clock timing looks like:

Clock Path Skew: 1.851ns (2.677 - 0.826) Source Clock: ipclk rising at 0.000ns Destination Clock: pclk rising at 0.000ns Clock Uncertainty: 0.035ns

ipclk is the output of BUFR, pclk is the output of BUFG. I guess 'Clock Path Skew' contains the added BUFG propagation delay and BUFR->BUFG routing.

As summary, for all I can say, the path is constrained in the tools and has non-negative slack as reported. But I can see the HW failing...

Anyone using a BUFR feeding a BUFG?

--
mmhai

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 6:32 AM

(snip, I wrote)

You might look at

formatting link

I looked some, but didn't see the answer.

Just because two signals are internal doesn't guarantee that the timing is known, though.

-- glen

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 12:46 PM

I understand that the circuit looks like this:

Pin--->IBUF--->BUFR--+--->BUFG---+--> | | | | | | | | +-----+ +-----+ | FF Q|---->|D FF | +-----+ +-----+ ^ | hold time errors here

I ran into that exact same problem a couple of years ago. I was given the task of fixing (someone else's) design that featured a similar misuse of clock buffers in a Virtex 4. I think the tools might have been ISE

8.2.

PAR and Trace said it was fine. Actual tests on the chip over temperature showed otherwise.

Moral: BUFGs have a large delay. Don't expect PAR to be able to make up for that amount of hold time using routing. You need to avoid going from your BUFR domain into the BUFG domain on the same clock edge. One solution might be to insert FFs clocked from the other edge of the BUFG clock. Another solution might be to connect the BUFG input to the IBUF output (not via the BUFR).

Regards, Allan

- J
- jonesandy
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 2:17 PM

While I agree that there should be ways to avoid this with other design cho ices, the tool is clearly identifying the clocks as related (it reported sk ew and relavent edges), but apparently it does not always find/report some hold timing violations.

Are there options in Xilinx STA to run 4 corner vs 2 corner timing analysis ? Or does it always run 4 corner? If it is not running 4 corner, that could be the reason it is missing the hold time problem. I've seen other tools t hat offer the choice occasionally miss a timing problem in 2 corner timing mode. 2 vs 4 corner analysis has to do with whether all 4 combinations of m in/max clock and min/max data are analyzed. Tricks to make hold time analys es less pessimistic (by accounting for correlations in propagation delays) are always a potential issue.

Andy

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 4:12 PM

(snip)

Seems like according to

formatting link

especially in the summary near the end, that BUFR --> BUFG is allowed, though it doesn't say anything about the timing.

I wonder if MMCMs would help?

As well as I understand it, for FF's clocked off the same clock, (and clock edge) you should never have hold problems. The minimum logic between Q and the next D is long enough that, even with maximum clock skew, a D can never change that fast. (Usually described by saying that the hold time is 0.)

There is much discussion on using MMCMs to generate zero delay clocks.

That is, the MMCM provides enough delay such that, for a clock of constant frequency, it can match the given edge.

As well as I understand it (which might not be all that well) it never tries to make up hold time.

That is what I would have thought one would do.

-- glen

- M
- mmihai
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 7:22 PM

Thanks for your comments.

Most interesting ... different chip(V4) had same issues....

I don't think is the BUFGs delay; my guess is more related to routing. Based on datasheet BUFG delay is 0.10ns .... reported "Clock Path Skew" is

1.851ns... whatever that includes.

I thought about that .... can't do it, the clock is fast and it won't meet setup for half clock cycle.

Can't do that either :-( a) pin is not BUFG capable b) even if it was capable it adds to much delay .... the flops clocked by BUFR sample the input, having a BUFG clocking those won't meet hold time on the IOs because the clock is too much delayed.

Some more notes: - I don't constrain the placement of BUFR/BUFG. - out of 27 signals always the ones failing have the smallest hold slack (less than .250ns?). Depending on placement the number of failing signals is anywhere between 0 and 5

I find it strange the tools can not handle the clock tree..... I do not think my structure is that exotic. What is the use of regional clocks if one can not transfer data to a global clock?

Any way to constrain the hold target >0.0 only for some specific paths?

--
mmihai

- M
- mmihai
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Nov 30, 2012 7:32 PM

I did open a webcase with Xilinx ... I've sent them my routed .ncd. Nobody said my clocking scheme is not allowed or not supported. I don't know if I've moved from 1st tier support though ..... till I did not get any meaningful help to solve my problem :-(

Thought about this too .... it won't work: input freq can change ... I don't think the PLL/DLL would like that

--
mmihai

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Dec 1, 2012 12:08 AM

MMCMs would not help for the problem I saw - the "clock" was bursty.

I fully agree that it *should* work. However, for at least V4 (and now it seems V6 as well) Xilinx's model of min / max delays on their chips isn't too good, and PAR will fail to compensate for clock skew if that skew is as large as a BUFG delay.

Please note I said BUFG delay not BUFG skew. BUFG delay is > 1ns. BUFG skew is usually < 0.3ns.

PAR always tries to make up hold time. There is an entire pass in PAR dedicated to that process. I don't have any PAR log files handy, but I believe it's easy to spot: look for the timing score. It will have a score for setup and another score for hold. The initial passes in PAR will reduce the setup score, with the hold time score remaining constant. Then towards the end (when it's finished working on the setup times) the hold time score will drop, usually to zero.

Regards, Allan

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Dec 1, 2012 12:23 AM

Simply by specifying clocked logic you have constrained the hold time to be > 0 ns.

I don't know of a way of adding extra margin though. I believe the best approach is to avoid the need for extra margin.

This is a tool bug. You have zero chance of fixing the tool, however you do have a good chance of being able to step around the bug.

Some other suggestions:

- Lock the placement of the BUFG and BUFR. You might find there is some magic combination of placements that just works. Earlier you said that some runs of PAR would produce designs that worked. Copy the placement from those runs as a starting point.

- constrain the logic in the BUFG domain to be physically apart from the BUFR region. This forces longer routes on the chip that will improve your hold time margin.

- Finally the brute force approach: treat the BUFG and BUFR clocks as if they were different clock domains. Use some sort of FIFO that is designed to handle different clock domains to pass data from the BUFR domain to the BUFG domain.

Regards, Allan

- B
- Brian Drummond
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Dec 1, 2012 10:28 AM

Is this a new design for the V6 or a port from another FPGA or a previous ISE release? Are you directly instantating these primitives and checking that they are still there in the RTL view?

I had a problem some years ago when moving a design from ISE7 to ISE10 and the tools silently changed what I asked into something completely different; in my case it moved a DCM from the BUFG where it generated a nicely aligned x2 clock, to the BUFG input signal - considerably increasing skew between these clocks!

So bugs in this area are not particularly new...

- Brian

- M
- mmihai
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 2, 2012 5:53 PM

Yes, I've meant extra margin. It seems the hold target is 0.0ns (I would guess the numbers include some padding).

It looks like a tool bug. It is very disturbing that it is not related to a particular version and it's on multiple [virtex] families...

I would expect the things to work if STA has good numbers. My confidence in the tools took a hit ...

a) I did not see any correlation between passing/failing and particular BUFR/BUFG placement b) I think it is a risky approach; if I can get a particular map/par run to work on some systems ... I have no guarantee it will be fine on //all// systems, over PVT.

Something like this could be the best solution, if doable ... but it's a pity to add logic because Xilinx tools can't handle the clock tree properly....

My logic looks very much like Figure 1-24/Page 28 from UG362, except I don't use BUFIO, so it is not that exotic.

--
mmihai

- M
- mmihai
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 2, 2012 5:55 PM

New design ... and BUFR/BUFG instantiated by hand. I've looked on fpga_editor and the buffers are there.

--
mmihai

- B
- Brian Drummond
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Dec 2, 2012 7:44 PM