#### Do you have a question? Post it now! No Registration Necessary

Re: opinions are OK

Let me try to repair the damage I did with my impatience:

When capturing data that is asynchronous with the clock, the flip-flop

will inevitably go metastable sooner or later.

Metastability manifests itself in unpredictable additional clock-to-out delay.

The user knows the clock frequency, probably knows the data freqiuency

at least roughly, and should know the amount of tolerable extra delay,

or the acceptable Mean-Time-Between-Failure.

Then one can consult the app note and table and see the connection.

MTBF is always inverse proportional to the product of the clock and data

frequencies.

Last October I published a XilinxTechXclusives paper which shows that at

a 300 MHz clock rate and 50 MHz data rate, the MTBF is one microsecond

for a total clock-to-Q plus set-up time of 1.0 ns. MTBF then increases a

million times for every additional half nanosecond available as extra delay.

At 3 ns, the MTBF is over a billion years.

All MTBF values must be scaled by the product of the two frequencies:

At 100 MHz clock and ~10 MHz data, the MTBF is, therefore, 15 times

longer.

So, in short:

Metastability is unavoidable. All attempts to avoid it are inherently

doomed, but the quantitative impact of metastability is quite tolerable.

That's it.

Peter Alfke, Xilinx Applications

===================

Jim Granville wrote:

When capturing data that is asynchronous with the clock, the flip-flop

will inevitably go metastable sooner or later.

Metastability manifests itself in unpredictable additional clock-to-out delay.

The user knows the clock frequency, probably knows the data freqiuency

at least roughly, and should know the amount of tolerable extra delay,

or the acceptable Mean-Time-Between-Failure.

Then one can consult the app note and table and see the connection.

MTBF is always inverse proportional to the product of the clock and data

frequencies.

Last October I published a XilinxTechXclusives paper which shows that at

a 300 MHz clock rate and 50 MHz data rate, the MTBF is one microsecond

for a total clock-to-Q plus set-up time of 1.0 ns. MTBF then increases a

million times for every additional half nanosecond available as extra delay.

At 3 ns, the MTBF is over a billion years.

All MTBF values must be scaled by the product of the two frequencies:

At 100 MHz clock and ~10 MHz data, the MTBF is, therefore, 15 times

longer.

So, in short:

Metastability is unavoidable. All attempts to avoid it are inherently

doomed, but the quantitative impact of metastability is quite tolerable.

That's it.

Peter Alfke, Xilinx Applications

===================

Jim Granville wrote:

Re: Metatstable Modeling

frequencies.

Is this correct ?

Wouldn't the 3.3nS to 10nS increase in clock time, buy you (10-3.3)/0.5

lots of 'a million times' scalings ?

Of do you mean the time to trigger a Event, not fail due to one ?

What about this issue:

With a CLK.Data stream, the CLK pulses that are not

adjacent to the DATA edges, cannot have metastable events, so

should not enter the scaling ?

The best model would seem to be a Data.Aperture AND a Clock.Aperture,

(both very small, but I don't think they HAVE to be equal ) and

when they overlap/come closer than a critical time threshold,

the metatstable dice rolls. What happens after the roll, depends on

how far away the next clock is (call this the settling tail)

Prediction stats would be an area-overlap basis, and assuming

async signals ( non zero phase velocity ) the area product would

be proportional to

(Data.Aperture/Data.EdgeT) x (Clock.Aperture/Clock.EdgeT)

Typically, Data.EdgeT = Data H or L time

Clock.EdgeT = ClkPeriod

This is average trigger/dice roll prediction, but the

actual 'metastable dice roll profile' will depend on the

phase velocity, and will have peaks much higher than the average.

What if your system hits/moves very slowly over this 'phase jackpot' ?

Here, area-mitigation stats are not much use, and you have to rely

mainly on the settling tail to next clock ( and maybe a small amount on

the natural system jitter )

IIRC Peter quoted 0.05fs virtual aperture time, and

natural jitter is likely to be some few ps - certainly large relative to

the aperture ?

An experimental setup designed to focus on this phase jackpot,

would give interesting results, and allow peak estimates, as well as a

higher

occurance for more usefull Tail stats gathering.

Summary : Best predictor model would have Data.Aperture, Clock.Aperture

and a Settling Tail.

Exact nature of the settling tail is system measurable over a range

of a few decades, but extrapolation is dangerous.

Agreed. I still think from an 'average user' perspective, that a

specific 'design cell' approach would help.

Also, from a technical detail viewpoint, implementing a

'regenerative latch triplet' [Pre-Latch + Flip Flop] or [Dual Flip Flop]

in a single local space, removes routing delays from one metastable

tail.

It does NOT 'fix' metastable behaviour, but it does encapsulate it,

and move it to the best the silicon can provide, and eliminates

the potentially variable routing delays.

It also allows for future technical research and improvements to

reduce the apertures, and the settling tail.

- jg

Re: Metatstable Modeling

delay.

frequencies.

Jim, I have seen your name here before, but I don't know what your

level of understanding of metastability is. So forgive me if I sound

like I am talking down to you. I don't know if you are trying to

discuss fine details of this topic or if you are new to the issues of

metastability.

If you look up references about metastability you will find that the

MTBF time scales linearly with clock and data rate, but exponentially

with settling time. There is a constant for each part of the equation.

These two constants are what characterize a particular FF design and

process used to build it. Peter's comment is saying that if you allow

just 3 ns settling time with his rates and parts, you will have an MTBF

of a billion years. Certainly you can go longer and get MTBF times

longer than the age of the universe. So yes, 10 ns would be way more

than enough.

No, a metastable event will happen with a much higher rate based only on

the rate of the clock and data. But it will have no impact on your

circuit if you don't use the output until after the metastability has

settled out. Given a time period this calculation determines how often

the metastable event will persist and cause an error.

This is already considered in the calculation. That is why the

frequencies are multiplied. The assumption is that the two rates are

truely asynchronous and are not correlated in any way. The the chance

of them happening in just the right timing relation is a function of how

often each of them is occuring.

This may sound good, but it no different than the current model and

would be much harder to measure. It is best not to think too hard about

this, but rather to be a bit on the empirical side. That seems to be

one way that Peter is very smart. His measurements seem very good to me

and many others. It is no good to rationalize about things you really

can't measure.

I think "phase velocity" is

***way***over the top. Before improving on the

current formula, it would be good to find something wrong with it. Is

there anything about it that falls short?

All of this is really just a way to relate what is happening. Since the

noise in the circuit is relatively large, I would expect tons more

jitter in the "window" than the actual width. So really the fs window

is just a concept, not a very real event.

Can you explain how this would be better than the current model?

Or you can just use the double FF approach and require a routing time

for this path that is at least 3 ns less than the clock period. Again,

simple, empirical and effective.

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com

Rick "rickman" Collins

snipped-for-privacy@XYarius.com

We've slightly trimmed the long signature. Click to see the full one.

Re: Metatstable Modeling

delay.

frequencies.

delay.

Sorry, was I that unclear ?

I think we are saying the same thing.

I was asking Peter for a clarify, only he can know what he meant.

I beg to differ. The best understanding come from finding models that

are easy to explain, and can be used in the widest manner, and that

also help guide (new) measurements and understanding.

Being a designer, I am all for 'hard numbers'.

Above you state

" The assumption is that the two rates are truely asynchronous

and are not correlated in any way."

If the model cannot cope with other than this

hypothetical ideal, that's rather 'falling short' ?

The concept of phase velocity is not way over the top, as it

introduces the important concept/point of what to do,

when 'truely async' does not apply, and also what to do, if

you design needs PEAK rather than simple average tolerance.

In some designs, that aspect will be important :

A system can have an 'average MTBF' number of some years,

but still fail a number of times in one hour.

IIRC Austin L. gave a good real-world example ?

Agreed, that's why I called it a virtual aperture.

The idea of Clock and data apertures also gives the correct

dimensions to the answer.

See above. I don't see it as radically different than the current

thinking,

( the tail model is the same, only I'd be more cautious about

far-extrapolation )

but it does allow better handling of peak/average predictions, and

it leads to real measurements to define these two.

But you have to know enough to take those steps, and it is still

exposed,

more than encapsulated.

I'm thinking of the newest breed of graduate, and they represent more

the

average user than you or I :)

-jg

Re: Metatstable Modeling

I have a new idea how to simplify the metstable explanation and calculation.

Following Albert Einstein's advice that everything should be made as

simple as possible, but not any simpler:

We all agree that the extra metastable delay occurs when the data input

changes in a tiny timing window relative to the clock edge. We also

agree that the metastable delay is a strong function of how exactly the

data transition hits the center of that window.

That means, we can define the width of the window as a function of the

expected metastable delay.

Measurements on Virtex-IIPro flip-flops showed that the metastable

window is:

• 0.07 femtoseconds for a delay of 1.5 ns.

• The window gets a million times smaller for every additional 0.5 ns of delay.

Every CMOS flip-flop will behave similarily. The manufacturer just has

to give you the two parameters ( x femtoseconds at a specified delay,

and y times smaller per ns of additional delay)

The rest is simple math, and it even applies to Jim's question of

non-asynchronous data inputs. I like this simple formula because it

directly describes the actual physical behavior of the flip-flop, and

gives the user all the information for any specific systems-oriented

statistical calculations.

Peter Alfke, Xilinx Applications

Following Albert Einstein's advice that everything should be made as

simple as possible, but not any simpler:

We all agree that the extra metastable delay occurs when the data input

changes in a tiny timing window relative to the clock edge. We also

agree that the metastable delay is a strong function of how exactly the

data transition hits the center of that window.

That means, we can define the width of the window as a function of the

expected metastable delay.

Measurements on Virtex-IIPro flip-flops showed that the metastable

window is:

• 0.07 femtoseconds for a delay of 1.5 ns.

• The window gets a million times smaller for every additional 0.5 ns of delay.

Every CMOS flip-flop will behave similarily. The manufacturer just has

to give you the two parameters ( x femtoseconds at a specified delay,

and y times smaller per ns of additional delay)

The rest is simple math, and it even applies to Jim's question of

non-asynchronous data inputs. I like this simple formula because it

directly describes the actual physical behavior of the flip-flop, and

gives the user all the information for any specific systems-oriented

statistical calculations.

Peter Alfke, Xilinx Applications

Re: Metatstable Modeling

Quite agree.

eg: Take a system that is not randomly async, but by some quirk of

nature, actually has two crystal sources, one for clock, and another

for the data. These crystals are quite stable, but have a slow

relative phase drift due to their 0.5ppm mismatch.

Now lets say I want to know not just the statistical average, but to

get

some idea of the peak - the real failure mode is not 'white noise', but

has distinct failure peaks near 'phase lock', and nulls clear of this.

Seems management wants to know how bad it can get, for how long,

not just 'how good it is, on average', so we'll humour them :)

That's a "specific systems-oriented statistical calculation".

Please demonstrate how to apply the above x & y, to give me

all the information I seek.

-jg

Re: Metatstable Modeling

Interesting.

Let's say we have two frequencies, 100 MHz even, and 100.000 050 MHz,

which is 50 Hz higher. These two frequencies will beat or wander over

each other 50 times per second.

Assuming no noise and no jitter, each step will be 10 ns divided by 2

million = 5 femtoseconds. That is 80 times wider than the capture window

for a 1.5 ns delay. Therefore we can treat this case the same way as my

original case with totally asynchronous frequencies. I think even jitter

has no bearing on this, because it also would be far, far wider that the

capture window. That means, this slowly drifting case is not special at

all, except that metastable events would be spaced by multiples of 20 ms

(1/50 Hz) apart. But that's irrelevant for events that occur on average

once per year or millenium.

Now, you will never ever, under any circumstances, get a guarantee not

to exceed a long delay, since by accident the flip-flop might go

perfectly metastable and stay for a long time. It's just an extremely

small probability, expressed as a very, very long MTBF. That is the

fundamental nature of metastability.

To repeat, I like the capture window approach because it is independent

of data rate and clock rate.

Greetings, and thanks for the discussion. It helped me clear up my mind...

Peter Alfke

=================================

Jim Granville wrote:

delay.

Let's say we have two frequencies, 100 MHz even, and 100.000 050 MHz,

which is 50 Hz higher. These two frequencies will beat or wander over

each other 50 times per second.

Assuming no noise and no jitter, each step will be 10 ns divided by 2

million = 5 femtoseconds. That is 80 times wider than the capture window

for a 1.5 ns delay. Therefore we can treat this case the same way as my

original case with totally asynchronous frequencies. I think even jitter

has no bearing on this, because it also would be far, far wider that the

capture window. That means, this slowly drifting case is not special at

all, except that metastable events would be spaced by multiples of 20 ms

(1/50 Hz) apart. But that's irrelevant for events that occur on average

once per year or millenium.

Now, you will never ever, under any circumstances, get a guarantee not

to exceed a long delay, since by accident the flip-flop might go

perfectly metastable and stay for a long time. It's just an extremely

small probability, expressed as a very, very long MTBF. That is the

fundamental nature of metastability.

To repeat, I like the capture window approach because it is independent

of data rate and clock rate.

Greetings, and thanks for the discussion. It helped me clear up my mind...

Peter Alfke

=================================

Jim Granville wrote:

delay.

Re: Metatstable Modeling

I don't want to beat a dead horse, but I do want to make clear that the

capture window model does not eliminate the frequency of the clock and

data from the failure rate calculation. The basic probability of a

failure from any single event is clearly explained by the window model,

but to get a failure rate you need to know the clock rates to know how

often the the possible event is tested, so to speak. If you double

either the clock or the data rate, you double the failure rate.

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com

Rick "rickman" Collins

snipped-for-privacy@XYarius.com

We've slightly trimmed the long signature. Click to see the full one.

Re: Metatstable Modeling

<snip>

delay.

<snip>

The asynchronous system produces even distribution across the sampling clock

cycle.

The synchronous system with arbitrary phase gives you a lumped distribution

at the phase offset.

The critical point to realize that you won't get a system to be consistently

going metastable is that there is

***significant***jitter in the sampling and

data clocks relative to the metastability window.

Determine the distribution of the data relative to the sample point. The

peak of this (gaussian?) distribution will be the worst-case error point.

What percentage of that statistical distribution is within the 0.07

femtosecond window? This provides for the "worst case" for management or

for engineers.

It may not have been as easy when the metastability window was much larger

than the system jitter.

Re: opinions are OK

When I was at Agilent I analysed the causes of failures in some FPGA

developments.

About half of all FPGA design related bugs (weighted by the time spent

finding them) were associated with asynchronous logic and clock domain

crossings. I guess that's not too surprising.

What you may find surprising is that 0% of the clock domain crossing

bugs had anything to do with metastability. Glitches and races were

the cause.

My interpretation:

I think that most designers have heard of metastability, so they put

retiming flip flops everywhere. Consequently, metastability related

problems don't occur often.

YMMV.

Regards,

Allan.

Re: opinions are OK

I don't understand why this is so hard to understand. Nothing personal,

it is just that a lot of people keep trying to make the comparator

solution work. The problem is that the output of the comparator has the

same problem as the output of FF1. It can be inderterminate (between

logic 0 and logic 1) for an indeterminate amount of time. "meta" can be

in transition at the time that FF2 is clocked with will clearly lead to

FF2 going metastable.

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com

Rick "rickman" Collins

snipped-for-privacy@XYarius.com

We've slightly trimmed the long signature. Click to see the full one.

Re: opinions are OK

I accept that with op amps with non infinite gain then just at the

edges of the comparator detection the comparator timing might be slow

or unknown(what you have called metastable-I don't know why), this

does not matter.

Lets just suppose that ff1 was only just metastable and was nearly

high. In this scenario with meta unknown then, the mux on ff2 may be

switching wildly or trying to average (I don't know modern semi

theory) the two inputs to ff2. However ff1 is just about 1 and so is

"not ff2" and ff2 will see a solid 1.

There may be a hole in my "design" but unless there is something about

op amp comparators I have forgotten (I did the theory about 15 yrs

ago) then this isn't it.

The problem with a forum like this is that someone steps out of the

box and other people think that the "out of boxer" has no experience

at all. As my last sentence sort of implied I'm not looking for a

solution, the maths for metastability is well understood and I'm not

looking to pay the obvious penalties for not using the classic

solution.

Re: opinions are OK

There is an issue if FF1 is metastable on two successive clock edges

when the input is slowly rising as FF2 will toggle twice, which simple

design practice can solve, but there is no issue if FF1 is meta twice

with an input pulse at exactly the clock period (FF2 toggling twice is

what the input has done). Other than that I haven't understood you.

Re: Original (5V) Xilinx Spartan ?

Hi Peter,

Amazing!

What a pity!

I just said that, if you want, we can recapitulate the metastability

stuff. During your absence we found the cure for metastability (Rick

designed the circuit), we built, tested and patented a perpetual

motion machine (Ray worked a lot here) and finally, we almost finished

a prototype of a Time Machine. We are preparing a Dinosaur Hunt, do

you want to join us?

Now, coming back to Earth.

You said (@ Thinking out loud about metastability): "I have never seen

strange levels or oscillations ( well, 25 years ago we had TTL

oscillations). Metastability just affects the delay on the Q output."

But Philip Freidin showed (@ Mitigating metastability) some pictures

of the FF output during metastability that disagree.

Do the Xilinx FFs have a different behavior?

One more question.

You also said (@Thinking out loud about metastability): "Remember:

Metastability causes an extra 3 ns of unpredictable delay once in a

billion years... Seems to be an affordable risk.".

What kind of input? What clock frequency?

Luiz Carlos

Amazing!

What a pity!

I just said that, if you want, we can recapitulate the metastability

stuff. During your absence we found the cure for metastability (Rick

designed the circuit), we built, tested and patented a perpetual

motion machine (Ray worked a lot here) and finally, we almost finished

a prototype of a Time Machine. We are preparing a Dinosaur Hunt, do

you want to join us?

Now, coming back to Earth.

You said (@ Thinking out loud about metastability): "I have never seen

strange levels or oscillations ( well, 25 years ago we had TTL

oscillations). Metastability just affects the delay on the Q output."

But Philip Freidin showed (@ Mitigating metastability) some pictures

of the FF output during metastability that disagree.

Do the Xilinx FFs have a different behavior?

One more question.

You also said (@Thinking out loud about metastability): "Remember:

Metastability causes an extra 3 ns of unpredictable delay once in a

billion years... Seems to be an affordable risk.".

What kind of input? What clock frequency?

Luiz Carlos

#### Site Timeline

- » switching problem
- — Next thread in » Field-Programmable Gate Arrays

- » Automatic signal fanout management in an FPGA (Xilinx type in this case)
- — Previous thread in » Field-Programmable Gate Arrays

- » What is the name of the circuit structure that generates a state machine's jumping si...
- — Newest thread in » Field-Programmable Gate Arrays

- » Estimating ROM gate count in ASIC
- — Last Updated thread in » Field-Programmable Gate Arrays

- » Protezione schermi LCD
- — The site's Newest Thread. Posted in » Electronics Hobby (Italian)