Synchronizer Flip Flop / Metastability

- S
- Simon Heinzle
  
  Contact options for registered users
posted
18 years ago

Fri, Sep 23, 2005 11:37 AM

To exchange flags (1 bit signals) between two unrelated clocks, a single synchronizer flip flop to clock that signal is used normally. However, under violations of setup/hold times of the flip flop, metastability can occur.

On a Virtex2 Pro, does metastability occur often? Does adding a second (or even third) flip flop after the synchronizer flip flop help or is that overcautious?

Best Regards, Simon Heinzle

- B
- B. Joshua Rosen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 23, 2005 1:52 PM

Standard practice is two flip flops not one. It's been decades since I looked at the math but as I recall there is a square law involved, i.e. the probability of a double synchronizer failing is the square of a probability of a single synchronizer failing. Someone who has looked at this more recently then me might say that my math is wrong but in any event there is no question that a double synchronizer is significantly better than a single synchronizer.

- G
- Gabor
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 23, 2005 2:27 PM

There have been many good threads on this subject. You may want to Google this group for "metastability." That being said, the answer is of course "it depends." The number of stages required depends on your clock rate, the metastability characteristics of the flip-flops, and your required failure tolerance.

Generally two stages are used, however it is possible to get away with one if the signal is only used at the next edge of the clock (which is almost like two stages except the second stage may have less tolerance for metastability if there is a LUT between the two stages). In any case, make sure the input is synchronized only once to avoid possible logic errors. This may seem obvious, but if your "single-stage" synchronizer output has many loads the tools may duplicate the flip-flop for you unless you specifically tell them not to. In the case of a two-stage synchronizer, only the second flip-flop might be duplicated as the fist stage has just one load. Also the routing delay from first to second stage can become important at higher frequencies.

Just my 2 cents Gabor

- P
- Phil Hays
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 23, 2005 2:36 PM

Xilinx has a application note on this general subject:

formatting link

While adding multiple stages can reduce the risk of a metastable failure, it is not the only way to do so. Both the odds of a metastable failure and the consequences of that failure are important.

The odds can be improved by increasing the settling time. See figure two in the application note. To make sure there is enough settling time, typical designs might both put an additional timing constraint between the synchronizer FF and the next FF or FFs, and put physical constraint (LOC or RLOC) on these FFs to position them for minimal delay.

The consequences can sometimes be improved by making the rest of the design more robust. Examples might be to design the logic to recover from a metastable failure, to make the software check the data and ignore/correct invalid values, etc.

--
Phil Hays to reply solve: phil_hays at not(coldmail) dot com  
 If not cold then hot

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 23, 2005 5:37 PM

Simon, Firstly, congratulations on having parents who can spell. Secondly, you may be interested in this link:-

formatting link

Lots of good stuff, complete with a link to this:-

formatting link

a simple and reliable circuit posted by Rick Collins to transfer a flag from one domain to another. Cheers, Symon.

p.s. Isn't Google ads great? At the bottom of the above linked page, I saw an advertising link for beach footwear! Guess which type....

under

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 23, 2005 6:07 PM

Simon, metastability is a problem that cannot be "solved", we can only reduce the probability of errors due to metastability. The extra delay at the Q output of a flip-flop (or latch) with undefined timing relationship between D and Clk is theoretically unbounded. This can be a surprise to digital designers who are accustomed to deterministic behavior. Metastability is not deterministic, it is a statistical phenomenon. The good news is that modern CMOS flip-flops and latches recover very fast, within a few ns, as shown in the Xilinx app note XAPP094, which is based on actual measurements, not on theory.

In many cases you will find that the mean-time-betwen-failure is millions or billions of years. Just make sure that the data path from the metastable-going flip-flop to the (single!) next synchronizing flip-flop is as fast as possible. No extra logic, absolutely minimal routing delays. Peter Alfke, Xilinx Applications

- S
- Simon Heinzle
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Sep 26, 2005 7:17 AM

Thanks a lot guys!

- R
- rhnlogic
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Sep 27, 2005 4:11 AM

This certainly depends on what you mean by "solved". Even machines intended to be completely deterministic are built of components which have various wear and statistical failure mechanisms. However, engineers usually consider the problem solved if the MTBF of the conglomeration of gears, relays, tubes, CMOS transistors, etc. is longer than the expected operational life of the widget by sufficient orders of magnitude. In that sense, one could easily consider the metastability problem solved when the probability of the register train not resolving is far less than that of the device getting vaporized in a direct meteor strike. In which case even a CMOS NAND gate would no longer produce the correct voltage result.

Which means the power supply, if not electromigration, background radiation, bonding wires, or even a meteor strike, etc., would far more likely cause any instance of the "solved" FPGA solution to fail even earlier.

IMHO. YMMV.

--
Ron
rhn A.T nicholson d.O.t C-o-M

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Sep 27, 2005 5:36 AM

Your wear-out analogy does not apply. Metastability failure is completely statistical and probabilistic. Even when the MTBF is a million years, the failure can occur in the next second. Not likely, but possible. That's why I claim that the problem can never be solved. We can only reduce the probability down to an acceptable level. I really believe that this is an important distinction vs any failure besed on wear-out. Peter Alfke

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Sep 27, 2005 7:07 PM

As has been said before, it's normally safe to make sure the MTBF is (say) double the length of time you wanna be at your present company. Cheers, Syms.

- B
- B. Joshua Rosen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Sep 27, 2005 8:02 PM

I bet that's what the Army Core of Engineers thought when they designed the levees in New Orleans. The design spec was for a 200 year event, none of those guys expected to be around in 200 years. We'll guess what, a 200 year event is another way of saying that there was a .5% chance of that event happening in any one year. The chances of two such events in the same year is .0025 percent, sounds pretty small doesn't it. But it happened because .0025% isn't zero.

When I was a physic's major the rule was that anything with a probability of less than once per 10^18 seconds could be treated as 0. 10^18 seconds is age of the universe.

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Sep 27, 2005 8:40 PM

That's a bit of a cheat because I don't believe Rita would have breached the levees on her own. The structures hadn't been properly rebuilt after Katrina. Also, it's possible that multiple big hurricanes in one season aren't statistically independent events. If the summer's sea/weather conditions are good for one hurricane, why not for two? Cheers, Syms.

- R
- rhnlogic
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Sep 27, 2005 10:19 PM

I did not use only a wear-out analogy, although I did mention some possibly wear related factors. On the flip side, many circuit components also have infant mortality failure rates.

The same is true of a meteor strike on the circuit under question (given that many are in chaotic orbits), as well as radiation upset events, thermal noise, even someone tripping over the power cord, etc.

Then the same must also be true of completely synchronous logic, because various other design issues also have probability factors which must also be reduced to acceptable levels to meet design goals. Good engineering usually involves balancing several risk factors so that all meet acceptable limits, and without wasting extra effort on those that are well away from being the limiting factor.

And I'm not saying that metastability is not a problem at all (having had to debug a few poorly designed asynchronous input to synchronous logic circuits in my junior engineer days). But there are a number of papers on the subject which essentially reduce some forms to solved problems for many given reliability levels when using properly designed and characterized flip-flops (as I assume your employer has done) with sufficient delay time and/or delay cycles for resolution with a probability which meets or exceeds that required for the given product reliability.

IMHO. YMMV.

--
Ron
rhn A.T nicholson d.O.t C-o-M

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Sep 28, 2005 2:06 AM

This has degraded into quibbling over semantics. I have documented quantitative data about the MTBF of metastability (Xilinx app note XAPP094), and I have stated publicly that it is not a real problem in all but very extreme cases. So nobody can say that I advocate scare tactics. It is, however, a fact that, whatever the MTBF, metastability-caused errors are always a (very remote) possibility. Enough said. I will not continue this "discussion". Peter Alfke

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Oct 11, 2005 9:13 AM

Don't forget to scale by the number of units you might ship and the cost of a failure. Does your PC crash? Does your server trash its database? Does your space probe head for the wrong planet?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.