sampling error between 2 clocks

- W
- wxy0624
  
  Contact options for registered users
posted
16 years ago

Mon, Dec 17, 2007 3:56 AM

Xilinx V4SX35 ISE 8.2.03 Modelsim

I got CLKI(300MHz), CLKI_DIV(150MHz) generated through a counter(just a flip_flop) clocked by CLKI, both clocks connect to BUFG. Then I use CLKI to sample data generated byCLKI_DIV(width=160bit), simulation result in some warnings which said setuptime is not enough during sampling. How can I constraint PAR to get enough setuptime?

Because of funtion request, I can not use DCM and OSERDES. The minimum delay between risingedge of CLKI_DIV and CLKI is much more than the period of CLKI. I have to make sure all simultaneous data sampled by CLKI simultaneously. But actually, there always some bits sampled a period(CLKI) later or earlier. I can constraint the max delay from the last 150MHz flip-flop to the first 300MHz flip-flop, but how can I constraint the minimum delay?

Thank you!!

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Dec 17, 2007 10:13 AM

Dear Whoever, Use CLKI to clock _all_ the synchronous elements. Use CLKI_DIV as the clock enable for all the synchronous elements you were going to clock with CLKI_DIV. HTH., Syms.

- W
- wxy0624
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Dec 18, 2007 1:50 AM

k

Thanks!

That is exactly what I am doing now, and the FPGA is working properly under lab condition. It just a warning during simulation. I just worry about when the environment, for example, the voltage changes, the temperature changes, or something like that.

I move the flipflop which generate CLKI_DIV to change the phase relationship between the two clocks, but it's time consuming and not effective.

Is there some other methods to achieve the setup time? Some kind of constraints in the UCF file?

- S
- Sean Durkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Dec 18, 2007 2:51 PM

Usually, the timing analysis that is done by the tools uses a worst case scenario, like 85 ¡ãC temperature and a very low VCCINT.

When you look at the logfile "par" produces (it's a text file with the ending .par in your project directory), there are lines like this at the beginning:

"Initializing temperature to 85.000 Celsius. (default - Range: -40.000 to 100.000 Celsius) Initializing voltage to 1.140 Volts. (default - Range: 1.140 to 1.260 Volts)"

You can even put the setting in you UCF:

TEMPERATURE = 75 C;

sets the temperature to 75 ¡ãC (pure magic!). I assume there's a similar setting for the core voltage.

So unless your real-life environment isn't worse than that, you should be safe.

HTH, Sean

--
My email address is only valid until the end of the month.
Try figuring out what the address is going to be after that...

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 12:28 AM

Read Symon's suggestion again....what you've described is NOT what he suggested.

And that 'warning' will turn into an intermittent functional failure for you eventually.

Temp will do it. Try cold spraying or heat gunning your 'working properly under lab condition' FPGA and is likely that it will fail.

That's because the correct approach is to have one clock in your design and have your the divided down clock be used as a clock enable. Example:

process(CLKI) begin if rising_edge(CLKI) then if (CLKI_DIV = '0') then -- Whatever you have that is currently clocked -- by 'CLKI_DIV' goes here. end if; end if; end process;

Yes, see above and get rid of all of your processes that are clocked by 'CLKI_DIV'

Only if you want to have a flaky design

Kevin Jennings

- W
- wxy0624
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 2:46 AM

Thanks!

But do you mean to let all logic clocked by CLKI, meanwhile use CLKI_DIV as a clk_enable?

That would make all the logic run at 300MHz. I want to use concurrent logic to achieve lower clock frequency, that is why I am using CLKI_DIV.

Even if V4SX55 can run at 300MHz, I don't think it's a good idea.

And I still have to worry about the skew of CLKI_DIV, and the phase relationship beteen the 2 clocks, which is the main problem. You know, if use BUFG to drive CLKI_DIV, the phase relationship is hard to control, If not, the skew will be a huge problem. These 2 problems are all I got right now, and they are still there!

I really wanna know your solution in detail.

- R
- RCIngham
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 11:18 AM

Why ask for expert help and refuse to try to see if the proferred answe solves the problem?

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 12:47 PM

Yes.

That's correct.

Some questions you should ponder then.... What is the point of the 300 MHz then? How low of a clock frequency are you trying to achieve? What is the reason for needing this lower clock frequency?

The only real correct answer to all of the above is that you have some chunk of logic that needs to run at 300, and some other chunks that just can't because they fail timing. If that is not the situation, then use CLKI_DIV as a clock enable and be done with it.

IF that is your situation, you should first consider breaking up the chunk-o-logic that doesn't run at 300 into smaller pipelined chunks that can.

Finally, if you really do need the two clocks, then any communication between the CLKI and CLKI_DIV should be treated as if they are totally asynchronous clocks which generally means inserting fifos to move the data across the clock domains.

Explain why you don't think it's a good idea.
Then explain why your design is attempting to run at least part of it at 300 MHz.
Then explain why your answer to #2 is not in violation of your answer to #1.

CLKI_DIV would no longer be a clock, you would not need to worry about the skew of CLKI_DIV and the phase relationship. Anyplace you previously were looking for a rising edge of CLKI_DIV (as a clock) you would replace with "if CLKI_DIV = '0'" as a clock enable as I demonstrated. If you had any code that was looking for falling edges of CLKI_DIV you would replace it with "if CLKI_DIV = '1'" (as a clock enable).

And the root cause of the problem is that you're using CLKI to generate CLKI_DIV which will inherently generate skew between these two signals. Even in simulation, they do not happen simultaneously, CLKI_DIV will happen on the next simulation delta. When they are both used as clocks you'll have the problems that you're seeing, if you use CLKI_DIV as a clock enable as Symon and I have pointed out you won't.

If you want to beat your head against the wall trying to solve this go ahead but all you'll get is a headache and a flaky design that will mysteriously work (or not) when you first power it on, then will not work (or work) once it has been powered up for some period of time....which you'll attribute to some mysterious temperature sensitivity....but the reason for the 'sensitivity' is improper design and failing timing analysis which is what your timing report is telling you right now.

What detail do you think was not disclosed previously?

Kevin Jennings

- B
- Brian Drummond
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 1:20 PM

Exactly.

But why?

No - you let the tools worry about them. If the tools can achieve timings WITHOUT issuing warnings, there is no worrying to be done.

This approach lets the timing constraints necessary on the 150MHz clock to be correctly inferred by the tools. And there are no clock domain crossings to worry about.

- Brian

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 3:08 PM

Another approach might be to generate the 150 MHz clock with a BUFGE, enabled by clki_div (keep in mind that such a clock may not be 50% duty cycle). Or use a DCM.

Otherwise, you have been given the best advice, and only if you really need the rest of your design to run at 150 (for power savings, etc?) should you actually generate a slower clock signal, and then there are good ways to do it (what I suggested above) and bad ways (what you were doing).

If your logic that you want to run at 150 MHz is too complex to run at

300, then you can use a 150 MHz clock enable as suggested previously, with multi-cycle path constraints to relax timing on those paths that have two 300 MHz cycles to complete. However, this is generally more error prone, since incorrectly specified multi-cycle paths can usually only be found by a simulation that exercises the incorrectly specified path.

As to the difference between lab operation and simulation warnings, even if you were to test with your voltage at the lowest possible value, and raise the temperature to the highest, consider that your lab experiment works on a sample of one FPGA, from one production lot. If you plan on producing more than just the board you have in the lab, you need to solve the problems indicated by the simulator and/or timing analysis tool. "Works in the lab" is necessary, but not sufficient for "works in production".

Andy

- M
- mk
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 5:14 PM

I don't think in general this makes sense; depending on the capabilities of the STA tool, the way timing is checked starts at a main pin and traces it through the clock buffers/delays for both source and the target flops to calculate the clock arrival times. The fact that a clock divider in one of the paths doesn't make the divided clock asynchronous and the result is no less dependable then when all the delays are just clock buffers/inverters. There is just one clock->Q delay instead of a buffer and a good STA tool should be account for that.

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 5:51 PM

I don't disagree about how STA computes paths, but what I said was "...the CLKI and CLKI_DIV should be treated as if they are totally asynchronous clocks". By that what I meant, was that one should approach the design *as if* the two clocks are unrelated and use proper clock domain crossing design techniques. That will lead to a robust design.

Trying to constrain your way to happiness generally results in 'brittle' designs where simple changes cause timing paths to now fail and lead to another round of new constraints, random number seeds, etc. because the synthesis tool was slightly off when it was estimating delays when it used those constraints to try to optomize timing during the mapping/fitting/routing process.

And the OP's static timing analysis already is catching the problem....the fact that he's not following good FPGA design practice is causing him to try to stomp out the problem with constraints that will keep biting until he changes the functional description of his design.

KJ

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 6:08 PM

Sorry I'm late to the thread...

The main point I haven't seen communicated is that the clock-enabled logic will, indeed, be running with a 300 MHz clock, but...

...you can specify a multi-cycle path for all the registers fed by the clock enable to two 300 MHz clock cycles.

The place&route and timing analysis will make sure the single-cycle

300 MHz logic works at 300 MHz as you need. The clock enable that needs to reach the enabled flops within a 300 MHz period will be routed properly. The logic that wants to run slower, however, requires only the lazy two cycles worth of delay to get the proper results; the uncertain results when the clock is not enabled will not affect your results.

As long as the multi-cycle constraint is properly applied, this all- synchronous approach works even better than pipelining the logic to be all 300 MHz.

Multi-cycle constraints and one high speed clock will solve many of your troubles.

- John_H

- M
- mk
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 6:43 PM

I think my problem is with the sections "should be treated ..." and "a robust design" which to me sounds like if they're not, the result will not be a robust design with which I disagree. Also forcing a design to run at 2X MHz speed will push the synthesis a lot more than running it at X. If one pays some attention to the first 2X->X boundary (ie minimal or no logic in between), the STA will correctly manage the clock tree and there will be no problems. This way the design will be at least as robust as treating them async and putting in async fifos which are quite complicated to design too.

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 7:00 PM

If the synthesis uses multi-cycle constraints, the synthesis will not be pushed. The synthesis will make sure the combinatorial paths conform to the needs of the multi-cycle path's period.

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Dec 19, 2007 8:08 PM

OK, we agree to disagree then.

Pipelining or specifying multi-cycle paths addresses that concern.

If generally true, then the OP wouldn't be having the setup problem now would he?

The reason he is having this warning is because of the inherent skew between the clock to a flip flop and the output of that flip flop and using them both as clocks which can not be 'managed' no matter how much attention you pay. This is a clock domain crossing problem, and while the frequency of the two clocks have a nominal relationship to each other (i.e. one is 2x the other) there is NO controllable relationship between the skew of the edges of these clocks and THAT is what is causing the timing 'warning' and eventually a real failure.

quoted text -

Well, I certainly wouldn't bother to write the code to implement an async FIFOs. I would write code that the synthesis tool would infer to be whatever hard macro that the device inherently has....could be as simple as using the 'lpm_fifo_dc' (or using Mr. Wizard if one so chooses). The silicon guys are the ones that can properly implement the async fifo, the FPGA designer needs to write code that causes that stuff to be used.

Kevin Jennings

- M
- mk
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Dec 20, 2007 5:05 AM

I don't think I can make a judgment on that without having more details of the OP's design.

I think this is an overly strong statement. For a more general case than we're talking here (not restricted to fpgas) what is the substantial difference between a clock buffer and a clk-> Q delay of a flop? I'm assuming that you'd agree if two flops were to get their clk inputs from two separate leaves of the clock tree the skew would be manageable and it can even be done if the two branches were related higher in the tree. If one of the branches includes a flop, does the skew become not managable no matter how much attention one pays?

Again no controllable relationship is a little strong in general. Imagine a clock tree with two branches and from one branch remove enough buffers to compensate for a clock->Q of a flop and insert a flop. Why do you think the skew between two branches is any less manageable than before? Or at least manageable with (sligthly) more difficulty but "no controllable relationship" ?

quoted text -

I was hoping that you'd say that an async fifo is not needed. Even if one can't manipulate the clock tree (as in an FPGA), the skew between a root clock and a divided clock is a fixed albeit unknown (actually known, exactly the clk->Q of one flop but not compensated in the clock tree) one so what is needed is at most a synchronous fifo. An asynchronous fifo would work but it is definitely unnecessary and overkill. A divided clock doesn't require a clock domain crossing solution where the rates of the two clocks have a changing relationship over time like having two clocks with very similar frequency with a couple of hundred ppm difference where an async fifo is called for.

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Dec 20, 2007 11:32 AM

Not at all, I think Kevin is spot on.

mk, The problem in an FPGA is that you lose timing margin on the clk -> Q uncertainty of the FF, and then you lose a lot more margin on the uncertainty of the routing from the FF to the global clock tree. This loose timing is the undoing of many a design; some builds work, some don't. Now, don't get me wrong, it may be that you can contrain the design so it always works, but I wouldn't like to try that with current technology at 300MHz. I'm painfully aware that the FPGA timing tools aren't the easiest thing to drive when it comes to multiple clock domains. Be aware that the enabled solution does have issues also, especially at very high clock rates. The enable signal has to get to all destination in one cycle. Synthesis tools can replicate enable nets with many destinations leading to naming issues in the UCF. Synthesis tools sometimes need cajoling into using the CE input of the FF or memory. However, I prefer these problems to the alternative. The async FIFO or a DCM (as suggested by others) are among the other ways to 'properly' solve this issue. The FIFO solution is not 'unnecessary and overkill' IMO. When it comes to crossing timing domains, even overkill isn't overkill. Or something like that! ( BTW, XAPP291 is a personal favourite!) HTH., Syms.

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Dec 20, 2007 7:18 PM

Indeed. This can't be stated too strongly. But I learned this one the hard way, and I suppose that's the only way to remember it.

-- Mike Treseler

One clock, One heart ...

- W
- wxy0624
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Dec 21, 2007 9:04 AM

Anyway, if we have to use more than one clock, for example, if the CLKI is 600MHz.

So we have to use CLKI_DIV, don't we?

Then, what should we do without using FIFO? You know, if the data is more than 100bit, FIFO will need a lot of resources.