Xilinx Multiple Clock Domains

Is there any "how to" documents on how to negotiate a two clock domain? I want to run an SRAM with a 3X clock and have everything else run slower. One of my issues is how the slower clock domain knows the phase of the faster domain, such that data can come across the clock domain, from fast to slow, at the right time. If I have a clock divider, such issues can be resolved in the logic, but I am using a DCM, and the internal workings don't seem to be as available, that is you just have two outputs, one fast, one slow.

I also need to simulate this in ModelSim. I haven't yet even seen the fast clock signal appear in the signals or waveform generator. Do I need an upgrade? Barring this, I suppose I could develop a component with the core design and then drive it with a VHDL module with a fast clock and another clock divided by three. Is this a good plan?

Reply to
Brad Smallridge
Loading thread data ...

to

don't

fast

core

Reply to
Symon

Brad,

All DCM outputs are phase aligned.

So, for example, if you use the CLK0 output, and the CLKFX output with M=3/D=1, every time CLK0 has a rising edge, there will be a rising edge for the CLKFX +/- the jitter of the DCM.

Or saying it differently, every third edge of the CLKFX corresponds to a CLK0 edge.

That is why the DCM is useful, is that it phase aligns everything to known phases and known phase alignments.

This accuracy in alignment is covered in the DCM specifications, as the skew between DCM outputs, in the datasheet.

Aust> Is there any "how to" documents on how to negotiate a two clock domain? I

Reply to
Austin Lesea

Austin,

Has the possibility of skew between the 1x and Nx clock due to loading and input jitter been eliminated then? I had a problem back when SpartanII was first released with a design where the incoming clock had enough jitter on it (introduced apparently by switching of outputs on the same bank as the clock pin) and vastly different loading on the 1x and 2x clocks so that I had problems crossing clock domains where I had a flip flop in one domain driving the direct input of a flip-flop in the other domain via the direct slice to slice connect inside a clb. Ever since then, I have been very careful about crossing domains even if they are generated by the same DLL/DCM.

One way to do it is to make a copy of the slower clock in the faster clock domain, and then use that for clock enables to make sure the signal is sensed away from the edge where it changes.

Aust> Brad,

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

You may find the following useful.

formatting link

Crossing clock domains is also discussed in several documents linked at the bottom of the above referenced page.

Philip

=================== Philip Freidin snipped-for-privacy@fpga-faq.com Host for

formatting link

Reply to
Philip Freidin

I would answer that NO, the skew has not been eliminated. The literature gives the impression that all the DCM outputs are perfectly phase-aligned when it appears to be just not true. How can it be? The DCM can only account for the delay across the BUFG *in its feedback path*. That means only the CLK0 or CLK2X can be perfectly phase-aligned, and not even the latter in the V2Pro because of the erratum disallowing the use of CLK2X for the feedback. The other outputs (CLKFX, CLKDV) have different loads and should have different delays across their respective BUFGs. I don't see how they could possibly be aligned with the input.

I recently had a problem on a V2Pro trying to transfer data from a 2X domain to a 1X domain, where both domains were driven by DCMs. The transfers had multiple errors indicating that the skew between the domains was too large. I resolved the problem by transferring across domains away from the edge of the receiving domain. Everything I have read implies that this isn't necessary.

Creating a copy of the slow clock in the fast domain is the method I use. The slow clock has to be sampled; actually I think I sampled the CLK90 or one of those to ensure I meet setup. With the copied clock I can always do the transfer in the middle of the slow clock cycle (or, in the poster's case, on the first third of the slow clock's cycle).

The DCMs work very well; I just think that the caveats for their use are not well-specified. An app-note explaining the clock-copying method Ray describes would be very helpful, if such a note does not yet exist.

-Kevin

input

first

clock pin)

problems

direct

connect

domains

domain,

from

domain? I

slower.

fast to

don't

one

fast

an

core

another

Reply to
Kevin Neilson

Kevin,

All of the outputs for the DCM are generated in the "outgen" block, which uses matched paths and devices, fully buffered. All of the outputs are derived from the delay line, so all of the timing is related.

That is how we can do it.

The skew is +/- 100 ps (as I recall) to acount for all the mismatches possible at the output of the DCM.

Now going into the BUG trees (which is fully buffered, so loads don't count) puts another uncertainty on the values, but from BUFG to BUFG these are also matched pretty well (less than a few tens of ps mismatch).

The flight time along the BUFG tree will vary, and if you get off near the center, or get off in the top right hand corner on a large part (ie

2VP100), you may also have 500 to 700 ps of time difference between these two nets.

Add system jitter, and DCM jitter to it, and if you are not careful, at a high frequency, you might get the result you did. On the other hand, 'being careful' means having to relationally place things, or hand place things, so that may not be something you want to do (I wouldn't unless I had to).

One very common misconception is that using CLK2X FB is somehow better than CLK0 FB: it is not. There is no difference specification wise. The CLK2X gets divided by two just before the phase detector, so any belief that it is better matched somehow assumes we did a perfect job matching a /2 to a straight thru path (which, again, we do our best).

If you believe the CLK2X is better, then why do you not believe the 'outgen' is just as good?

I am all for full synchronous design, (simpler, easier to verify), but it seems that folks keep finding ways to use the DCM in what I might call "isochronous design."

Aust> I would answer that NO, the skew has not been eliminated. The literature

Reply to
Austin Lesea

Hi,

Does that mean that something like what's below should work without worrying about timedomain crossing ?

___ ___ ____ ____ | R | | R | | R | | R |

--| e |---- Comb 1 ---| e |---- Comb 2 ----| e |--\/--- Comb 3 ---| e |--- | g | | g | / | g | | | g | | 1 | | 2 | | | 3 | | | 4 | /|>__| /|>__| | ClkFX-|>__| | /|>__| | | \_________________/ | Clk Clk Clk

With ClkFX = n * Clk

Note that It doesn't represent something in particular, just something out of my imagination. What comes to mind is for example an operation that would require 3 multiplexer, you could time-multiplex them this way and the rest of the pipeline don't know it ...

Sylvain

Reply to
Sylvain Munaut

You seem to be stating that the skew between a heavily loaded clock tree and a lightly loaded clock tree will be close enough that setup/hold will be met when cascaded adjacent flip-flops are clocked from each domain.

In my own DDR designs, with lightly loaded DDR register clocks and heavily loaded internal logic clocks, timing reports and lab testing have shown otherwise, and opposite-edge ( or 90/270 phased ) clocks were needed to properly transfer between the two clock domains.

Again, it's a concern with deskewing the clock tree delays- as Kevin pointed out, you can't feedback the loaded 2x clock net into the DCM, so the 2x clock net will be offset by the BUFG net difference between the 2x clock and the 1x clock net that you can use as feedback.

( the lack of 2X feedback in V2P & S3 also makes it harder to do certain internal/external clock deskew topologies )

Another related concern for cascaded DCM's, or in cases where you need a known (zero) phase relationship between the DCM input and output: unless you set the DCM to SOURCE_SYNCHRONOUS mode, the DCM output clock will LEAD the input clock (by ~1.5 ns in V2) as a result of the internal DCM feedback delay element used to insure zero-hold at IOBs.

Brian

Reply to
Brian Davis

The fact that the clocks are in phase isn't the issue. The issue is that the slow domain doesn't "know" where the fast domain is in its cycle. Suppose I send out a read command on fast clk 1, the data comes back on fast clk 3, and should be transfered to the slow domain on fast clk 4. And suppose this cycle repeats every 3 fast clks. How do I get the slow clock to lign up with fast clk 4, 7, 10, etc.

The idea of just using one fast clk, with clock enables on all the slow stuff, is an attractive idea. However I don't have any experience in determining whether this will lead to a timing constraint issues or not. It seems that it will be difficult enough to get the performance from the SRAM interface. How do I loosen the constraints on the slow stuff? All those registers would be on the same clock net.

Ray Andraka suggested making a copy of the slow clock in the fast domain, and using that signal as a clock enable, I guess in the fast domain? I'm not sure how this helps. The idea of using 90/270 clocks to get rid of skew issues seems good.

I have found XAPP253 which is an DDR SDRAM controller with some of the same issues. Trying to work through that.

Reply to
Brad Smallridge

YES, YES, YES, as someone once said in a movie! Brad, you need to put something like this this in your UCF file:-

NET "fast_clock" period = 10ns; NET "slow_enable" TNM=FFS "slow_flipflops"; TIMESPEC TS1000 = FROM : slow_flipflops : TO : slow_flipflops : 30ns;

So, in your VHDL or whatever, you've used a net 'slow_enable' to clock enable the FFs that go at a third the speed of the 'fast_clock', I've assumed 'fast_clock' is 100MHz for this example. That's the first line of UCF stuff. The second line associates all the FFs that connect to the destinations of the net 'slow_enable' with the timing group name "slow_flipflops". The third line says that signals between members of timing group "slow_flipflops" have 30ns to get to their destination. Note that the FF that generates net 'slow_enable' is NOT included in this group, so the PAR knows to route this net to meet the 10ns requirement of 'fast_clock'. Gothchas include the tools making copies of the net "slow_enable" if it has a large fanout. If this happens, you need to add extra lines like:-

NET "slow_enable_1" TNM=FFS "slow_flipflops"; NET "slow_enable_2" TNM=FFS "slow_flipflops";

You can get the net names from the EDIF file, the floorplanner, wherever.

A little time and effort up front will save you from a world of hurt later!

Good luck mate, Syms.

Reply to
Symon

Hmm. I didn't think of tying the timing nets together with the enable. That makes a lot of sense. Thanks.

Reply to
Brad Smallridge

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.