Xilinx DCM outputs for DDR

Greetings,

What's the best way to clock DDR flops from a DCM?

In all the DDR I/O I've produced, I've always used the DCM clock output through a BUFG, feeding the normal and inverted versions of that buffered clock to my I/O registers. I understand that - at least in some silicon - there can be duty-cycle distortion that would compromise this normal/invert approach versus the alternative....

Much of the early DDR information suggested using the 0 and 180 degree outputs from the DCM. I never considered this a good design practice because the outputs are going through different BUFGs with different clock loads on each net resulting in a designed-in skew that would compromise the DDR sampling windows.

Is there any real evidence one way or the other to suggest that one of these approaches is better than the other?

I was sad to see the code for xapp485 appears to use two BUFGs, but not for CLKFX and CLKFX180, but inputs of CLKFX and ~CLKFX! If that's not a bastardization of the two choices, I'm not sure what is.

Don't newer parts route differential clocks for single global clock nets now? Or is that just the Virtex series? My designs are currently Spartan-3E with DDR I/O in the 400-600 Mb/s range.

- John_H

Reply to
John_H
Loading thread data ...

Using the CLK0 and CLK180 DCM outputs into two BUFGs produces less duty cycle distortion than using one clock at the IOB plus the IOB's locally inverted clock, when doing DDR stuff. This makes sense because the locally inverted clock will always have some delay with respect to its source clock. The Xilinx documentation for high speed RAM interfaces should confirm this.

I believe that Virtex5 is the first family to use differential clocks internally, and this will further reduce duty cycle distortion when using DDR techniques.

Bob

Reply to
BobW

What about clock skew due to different loading of the two clocks? If there's 1.482 ns on one and 1.876 ns on the other, doesn't the 396 ps swamp out any duty cycle distortion?

I'm hoping to find something more certain than "gut feel" for this issue. If anyone has produced results saying one way *is* better than the other, I'd love to know.

If the clock routing produced the same delays, I'd agree that duty cycle distortion - different propagation delays for rising versus falling edges - would be a reason to go with clk180. Since the clock routing doesn't produce the same delays (as far as I know there's no way to get them to match) I'm still not sure what the best method is.

- John_H

Reply to
John_H

I would tend to agree with you, but this should be easily measurable. Using a DDR flop with D1 high and D0 low (like a clock driver) you just need to look at the duty cycle. Run the design both ways and check it out...

Reply to
Gabor

I'm more concerned with the input side, actually. I was thinking the output duty cycle could get distorted just by the nature of the output drivers but I figure LVDS would do a decent job of balancing the line.

Would a 1-up measurement really give an idea of what works better overall? I've read some of the Xilinx info (some a bit old) that suggests the duty cycle distortion can add from 90 ps (Virtex-II many years ago) to 400ps (Spartan-3, more recent) but the "DCD" timing numbers referred to no longer show up in the DC&Switching section of the data sheet.

For any one design, different clock loading will result in different clock skew between a CLKFX and CLKFX180 signal unless the I/O happen to be unaffected by clock loading elsewhere on the chip.

Measurements may start to give me an idea of what's going on in the silicon but I would sure appreciate some concrete information. At least in parts with differential clocks (assuming it propagates past one family) won't have duty cycle distortion issues.

Bleah.

- John_H

Reply to
John_H

In V2, the two clock net scheme works best.

When doing this, I've avoided the DCM jitter by running the clock into a DIFF_OUT global buffer variant without using a DCM.

I haven't measured this in S3E.

I've used isolated clock nets for just the DDR I/O, with a different net for the global logic clock, in my past V2 designs; this requires intentional LOCing of the global buffers to the clock sites having shared routing resources.

I posted about this a while back on the "DIFF_OUT Buffer Example" thread, see this thread and the comments in the code example:

formatting link

I measured V2 clock duty cycle and DDR setup/hold, without any DCM involvement, by using a two channel HP8133A and forwarding the clock and DDR data output back out another LVDS pair to a 20 GHz sampler.

V4 and V5 have differential global routes, but in V4 some of the regional resources are still single ended ( and because V4 can't directly clock a global spline from a local clock input having the best DDR I/O performance, some creative clocking workarounds are required to avoid needing the same clock on a global & local input)

Brian

Reply to
Brian Davis

Thanks for the discussion and the link to the previous thread.

I have the unfortunate need of using the DCMs to multiply the incoming clock so I lose margin from the start. In the other thread you mentioned the DCM does 50% duty cycle correction - as I've known - but the XAPPs *appear* to suggest that 50% duty cycle correction happens local to the DCM - not at the global buffer feedback level - such that duty cycle distortion in the global clock is still an issue. Joy.

I really don't enjoy the idea of 3 global clocks to get the matched

0/180 clocks for the IOBs and the logic clock, but if it gains another 300 ps in margin, perhaps it's worth rearranging things. I appreciate that I'm not the only one who sees the clock loading difference as a root issue that needs to be addressed.

Maybe XAPP485 didn't have it terribly wrong using the CLKFX and ~CLKFX as the inputs to the two clock buffer (that I saw as a bastardization of the 2 approaches). If the 50% duty cycle correction is local to the DCM and not the global clock, this inversion would work. If the

50% duty cycle correction is global, however, the global clock polarity that isn't part of the feedback path would be that much worse off.

I have yet to find an app note or white paper that really addresses the global 50% issue. I'll look deeper into DCM literature and perhaps try some 3-clock receive timing analysis to see how/if the numbers differ.

Thanks,

- John_H

Reply to
John_H

The real issue about duty cycle distortion, in DDR interfaces, is how it affects setup/hold time. Doesn't the use of the (jittery) CLKFX outputs have a much bigger effect on setup/hold time?

Bob

Reply to
BobW

The jitter is miserable for high speed DDR interfaces. But when the clock frequency comes in at 1/7th the bit rate and there's no PLLs in the Spartan3E, the DCMs are a necessary evil. I just want to make the DCM jitter the *only* remaining loss to the design margins that I can twiddle with.

- John_H

Reply to
John_H

Quick update: the issue I thought I had with XAPP485 wasn't an issue after all. I've been resurrecting the DCM issue after going though a coworker's design. The XAPP485 code he modified initially used the CLKFX180 after all. My memory got the better of me because of the shorthand used in the original and modified code - rxclk35not rather than rxclk35_180 or similar nomenclature - left me only with the recollection of the inverted clock in the modified code, not the original version.

Also, rereading the Spartan-3 Gen User guide's DCM section with this specific topic in mind, the 50% duty cycle correction appears to be local to the DCM regardless of feedback.

Since the global clock buffers don't have an inverted input as an option, I'm finally happy that the CLK180 output (and CLK270 for that matter) is available.

- John_H

Reply to
John_H

My recollection is that changes to the feedback path affected the forwarded global clock output duty cycle for the V2 tests I did ~5 years back at 200-300 MHz clock rates (400-600 Mbps).

Could be either my that my recollection is wrong, or that I was seeing some other effect ( i.e. main delay line/feedback delay line dispersion variations ) that kicked in when I changed the feedback source or feedback deskew settings.

I'd also second the worries about DCM jitter.

IIRC the fed-back clk0 output has better jitter specs than the 180, giving asymmetrical sampling windows for rising/falling edges.

Can you do a x7 externally?

Brian

Reply to
Brian Davis

The board was set long ago with different revisions already in the hands of various developers. Even the FPGA pinout is hard at this point.

The serial receiver is working well in the units we have in house. The timing numbers on this link give me the willies, however, hence my desire to understand any nuance to improve the timing margin. I'm just sad the unused external PLL channels aren't available to the FPGA. Wouldn't that have been a nice tweak!

- John_H

Reply to
John_H

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.