Original (5V) Xilinx Spartan ? - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: opinions are OK
Let me try to repair the damage I did with my impatience:

When capturing data that is asynchronous with the clock, the flip-flop
will inevitably go metastable sooner or later.
Metastability manifests itself in unpredictable additional clock-to-out delay.
The user knows the clock frequency, probably knows the data freqiuency
at least roughly, and should know the amount of tolerable extra delay,
or the acceptable Mean-Time-Between-Failure.

Then one can consult the app note and table and see the connection.
MTBF is always inverse proportional to the product of the clock and data
frequencies.

Last October I published a XilinxTechXclusives paper which shows that at
a 300 MHz clock rate and 50 MHz data rate, the MTBF is one microsecond
for a total clock-to-Q plus set-up time of 1.0 ns. MTBF then increases a
million times for every additional half nanosecond available as extra delay.
At 3 ns, the MTBF is over a billion years.
All MTBF values must be scaled by the product of the two frequencies:
At 100 MHz clock and ~10 MHz data, the MTBF is, therefore, 15 times
longer.

So, in short:
Metastability is unavoidable. All attempts to avoid it are inherently
doomed, but the quantitative impact of metastability is quite tolerable.

That's it.
Peter Alfke, Xilinx Applications
===================
Jim Granville wrote:
Quoted text here. Click to load it

Re: Metatstable Modeling
Quoted text here. Click to load it
frequencies.
Quoted text here. Click to load it

 Is this correct ?
Wouldn't the 3.3nS to 10nS increase in clock time, buy you (10-3.3)/0.5
lots of 'a million times' scalings ?

 Of do you mean the time to trigger a Event, not fail due to one ?

 What about this issue:
  With a CLK.Data stream, the CLK pulses that are not
adjacent to the DATA edges, cannot have metastable events, so
should not enter the scaling ?



 The best model would seem to be a Data.Aperture AND a Clock.Aperture,
(both very small, but I don't think they HAVE to be equal )  and
when they overlap/come closer than a critical time threshold,
the metatstable dice rolls. What happens after the roll, depends on
how far away the next clock is (call this the settling tail)

 Prediction stats would be an area-overlap basis, and assuming
async signals ( non zero phase velocity ) the area product would
be proportional to
(Data.Aperture/Data.EdgeT) x (Clock.Aperture/Clock.EdgeT)

Typically, Data.EdgeT = Data H or L time
Clock.EdgeT = ClkPeriod

 This is average trigger/dice roll prediction, but the
actual 'metastable dice roll profile' will depend on the
phase velocity, and will have peaks much higher than the average.

 What if your system hits/moves very slowly over this 'phase jackpot' ?

 Here, area-mitigation stats are not much use, and you have to rely
mainly on the settling tail to next clock ( and maybe a small amount on
the natural system jitter )
IIRC Peter quoted 0.05fs virtual aperture time, and
natural jitter is likely to be some few ps - certainly large relative to
the aperture ?

 An experimental setup designed to focus on this phase jackpot,
would give interesting results, and allow peak estimates, as well as a
higher
occurance for more usefull Tail stats gathering.
 
 
 Summary : Best predictor model would have Data.Aperture, Clock.Aperture
and a Settling Tail.
 Exact nature of the settling tail is system measurable over a range
of a few decades, but extrapolation is dangerous.  


Quoted text here. Click to load it

Agreed. I still think from an 'average user' perspective, that a
specific 'design cell' approach would help.

 Also, from a technical detail viewpoint, implementing a
'regenerative latch triplet' [Pre-Latch + Flip Flop] or [Dual Flip Flop]
in a single local space, removes routing delays from one metastable
tail.

 It does NOT 'fix' metastable behaviour, but it does encapsulate it,
and move it to the best the silicon can provide, and eliminates
the potentially variable routing delays.
 It also allows for future technical research and improvements to
reduce the apertures, and the settling tail.

- jg


Quoted text here. Click to load it

Re: Metatstable Modeling
Quoted text here. Click to load it
delay.
Quoted text here. Click to load it
frequencies.
Quoted text here. Click to load it

Jim,  I have seen your name here before, but I don't know what your
level of understanding of metastability is.  So forgive me if I sound
like I am talking down to you.  I don't know if you are trying to
discuss fine details of this topic or if you are new to the issues of
metastability.  

If you look up references about metastability you will find that the
MTBF time scales linearly with clock and data rate, but exponentially
with settling time.  There is a constant for each part of the equation.
These two constants are what characterize a particular FF design and
process used to build it.  Peter's comment is saying that if you allow
just 3 ns settling time with his rates and parts, you will have an MTBF
of a billion years.  Certainly you can go longer and get MTBF times
longer than the age of the universe.  So yes, 10 ns would be way more
than enough.  


Quoted text here. Click to load it

No, a metastable event will happen with a much higher rate based only on
the rate of the clock and data.  But it will have no impact on your
circuit if you don't use the output until after the metastability has
settled out.  Given a time period this calculation determines how often
the metastable event will persist and cause an error.  


Quoted text here. Click to load it

This is already considered in the calculation.  That is why the
frequencies are multiplied.  The assumption is that the two rates are
truely asynchronous and are not correlated in any way.  The the chance
of them happening in just the right timing relation is a function of how
often each of them is occuring.  

 
Quoted text here. Click to load it

This may sound good, but it no different than the current model and
would be much harder to measure.  It is best not to think too hard about
this, but rather to be a bit on the empirical side.  That seems to be
one way that Peter is very smart.  His measurements seem very good to me
and many others.  It is no good to rationalize about things you really
can't measure.


Quoted text here. Click to load it

I think "phase velocity" is *way* over the top.  Before improving on the
current formula, it would be good to find something wrong with it.  Is
there anything about it that falls short?  


Quoted text here. Click to load it

All of this is really just a way to relate what is happening.  Since the
noise in the circuit is relatively large, I would expect tons more
jitter in the "window" than the actual width.  So really the fs window
is just a concept, not a very real event.  


Quoted text here. Click to load it

Can you explain how this would be better than the current model?  


Quoted text here. Click to load it

Or you can just use the double FF approach and require a routing time
for this path that is at least 3 ns less than the clock period.  Again,
simple, empirical and effective.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: Metatstable Modeling
Quoted text here. Click to load it
delay.
Quoted text here. Click to load it
frequencies.
Quoted text here. Click to load it
delay.
Quoted text here. Click to load it

Sorry, was I that unclear ?

Quoted text here. Click to load it


I think we are saying the same thing.

Quoted text here. Click to load it

I was asking Peter for a clarify, only he can know what he meant.

Quoted text here. Click to load it

 I beg to differ. The best understanding come from finding models that
are easy to explain, and can be used in the widest manner, and that
also help guide (new) measurements and understanding.
 Being a designer, I am all for 'hard numbers'.

Quoted text here. Click to load it

 Above you state
" The assumption is that the two rates are truely asynchronous
and are not correlated in any way."

 If the model cannot cope with other than this
hypothetical ideal, that's rather 'falling short' ?

 The concept of phase velocity is not way over the top, as it
introduces the important concept/point of what to do,
when 'truely async' does not apply, and also what to do, if
you design needs PEAK rather than simple average tolerance.
 In some designs, that aspect will be important :

 A system can have an 'average MTBF' number of some years,
but still fail a number of times in one hour.

IIRC Austin L. gave a good real-world example ?

Quoted text here. Click to load it

Agreed, that's why I called it a virtual aperture.
The idea of Clock and data apertures also gives the correct
dimensions to the answer.

Quoted text here. Click to load it

See above. I don't see it as radically different than the current
thinking,
( the tail model is the same, only I'd be more cautious about
far-extrapolation )
but it does allow better handling of peak/average predictions, and
it leads to real measurements to define these two.

Quoted text here. Click to load it

 But you have to know enough to take those steps, and it is still
exposed,
more than encapsulated.
 I'm thinking of the newest breed of graduate, and they represent more
the
average user than you or I :)

-jg

Re: Metatstable Modeling
I meant to write a lengthy rebuttal and explanation, but rickman said it all.
Thanks !
Peter Alfke

Re: Metatstable Modeling
I have a new idea how to simplify the metstable explanation and calculation.
Following Albert Einstein's advice that everything should be made as
simple as possible, but not any simpler:

We all agree that the extra metastable delay occurs when the data input
changes in a tiny timing window relative to the clock edge. We also
agree that the metastable delay is a strong function of how exactly the
data transition hits the center of that window.
That means, we can define the width of the window as a function of the
expected metastable delay.

Measurements on Virtex-IIPro flip-flops showed that the metastable
window is:

0.07 femtoseconds for a delay of 1.5 ns.
The window gets a million times smaller for every additional 0.5 ns of delay.

Every CMOS flip-flop will behave similarily. The manufacturer just has
to give you the two parameters ( x femtoseconds at a specified delay,
and y times smaller per ns of additional delay)

The rest is simple math, and it even applies to Jim's question of
non-asynchronous data inputs.  I like this simple formula because it
directly describes the actual physical behavior of the flip-flop, and
gives the user all the information for any specific systems-oriented
statistical calculations.

Peter Alfke, Xilinx Applications

Re: Metatstable Modeling
Quoted text here. Click to load it

Quite agree.


 eg: Take a system that is not randomly async, but by some quirk of
nature, actually has two crystal sources, one for clock, and another
for the data. These crystals are quite stable, but have a slow
relative phase drift due to their 0.5ppm mismatch.

 Now lets say I want to know not just the statistical average, but to
get
some idea of the peak - the real failure mode is not 'white noise', but
has distinct failure peaks near 'phase lock', and nulls clear of this.
 Seems management wants to know how bad it can get, for how long,
not just 'how good it is, on average', so we'll humour them :)

 That's a "specific systems-oriented statistical calculation".
Please demonstrate how to apply the above x & y, to give me
all the information I seek.

-jg

Re: Metatstable Modeling
Interesting.
Let's say we have two frequencies, 100 MHz even, and 100.000 050 MHz,
which is 50 Hz higher. These two frequencies will beat or wander over
each other 50 times per second.
Assuming no noise and no jitter, each step will be 10 ns divided by 2
million = 5 femtoseconds. That is 80 times wider than the capture window
for a 1.5 ns delay. Therefore we can treat this case the same way as my
original case with totally asynchronous frequencies. I think even jitter
has no bearing on this, because it also would be far, far wider that the
capture window.  That means, this slowly drifting case is not special at
all, except that metastable events would be spaced by multiples of 20 ms
(1/50 Hz) apart. But that's irrelevant for events that occur on average
once per year or millenium.

Now, you will never ever, under any circumstances, get a guarantee not
to exceed a long delay, since by accident the flip-flop might go
perfectly metastable and stay for a long time. It's just an extremely
small probability, expressed as a very, very long MTBF. That is the
fundamental nature of metastability.

To repeat, I like the capture window approach because it is independent
of data rate and clock rate.
Greetings, and thanks for the discussion. It helped me clear up my mind...

Peter Alfke
=================================
Jim Granville wrote:
Quoted text here. Click to load it
delay.
Quoted text here. Click to load it

Re: Metatstable Modeling
Quoted text here. Click to load it

I don't want to beat a dead horse, but I do want to make clear that the
capture window model does not eliminate the frequency of the clock and
data from the failure rate calculation.  The basic probability of a
failure from any single event is clearly explained by the window model,
but to get a failure rate you need to know the clock rates to know how
often the the possible event is tested, so to speak.  If you double
either the clock or the data rate, you double the failure rate.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: Metatstable Modeling
Quoted text here. Click to load it
<snip>
Quoted text here. Click to load it

 I'm collecting empirical results - do you have any URLs, especially
covering the 'double either' aspect ?

-jg

Re: Metatstable Modeling

Quoted text here. Click to load it
  <snip>
Quoted text here. Click to load it
delay.
  <snip>
Quoted text here. Click to load it

The asynchronous system produces even distribution across the sampling clock
cycle.
The synchronous system with arbitrary phase gives you a lumped distribution
at the phase offset.
The critical point to realize that you won't get a system to be consistently
going metastable is that there is *significant* jitter in the sampling and
data clocks relative to the metastability window.

Determine the distribution of the data relative to the sample point.  The
peak of this (gaussian?) distribution will be the worst-case error point.
What percentage of that statistical distribution is within the 0.07
femtosecond window?  This provides for the "worst case" for management or
for engineers.

It may not have been as easy when the metastability window was much larger
than the system jitter.



Re: opinions are OK
On Tue, 09 Sep 2003 11:58:10 -0700, Austin Lesea

Quoted text here. Click to load it

When I was at Agilent I analysed the causes of failures in some FPGA
developments.

About half of all FPGA design related bugs (weighted by the time spent
finding them) were associated with asynchronous logic and clock domain
crossings.  I guess that's not too surprising.

What you may find surprising is that 0% of the clock domain crossing
bugs had anything to do with metastability.  Glitches and races were
the cause.

My interpretation:
I think that most designers have heard of metastability, so they put
retiming flip flops everywhere.  Consequently, metastability related
problems don't occur often.

YMMV.

Regards,
Allan.

Re: opinions are OK
Hi Austin,

I was just kidding with Peter. No ofense at all. I just said that, if
he wants, we can recapitulate all the metastability stuff, so he does
not need to be sad about not beeing here! See my other post.

As he come back from Portugal, I wrote in portuguese!

Luiz Carlos

Re: opinions are OK
Quoted text here. Click to load it

I don't understand why this is so hard to understand.  Nothing personal,
it is just that a lot of people keep trying to make the comparator
solution work.  The problem is that the output of the comparator has the
same problem as the output of FF1.  It can be inderterminate (between
logic 0 and logic 1) for an indeterminate amount of time.  "meta" can be
in transition at the time that FF2 is clocked with will clearly lead to
FF2 going metastable.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: opinions are OK
Quoted text here. Click to load it

I accept that with op amps with non infinite gain then just at the
edges of the comparator detection the comparator timing might be slow
or unknown(what you have called metastable-I don't know why), this
does not matter.

Lets just suppose that ff1 was only just metastable and was nearly
high. In this scenario with meta unknown then, the mux on ff2 may be
switching wildly or trying to average (I don't know modern semi
theory) the two inputs to ff2. However ff1 is just about 1 and so is
"not ff2" and ff2 will see a solid 1.

There may be a hole in my "design" but unless there is something about
op amp comparators I have forgotten (I did the theory about 15 yrs
ago) then this isn't it.

The problem with a forum like this is that someone steps out of the
box and other people think that the "out of boxer" has no experience
at all. As my last sentence sort of implied I'm not looking for a
solution, the maths for metastability is well understood and I'm not
looking to pay the obvious penalties for not using the classic
solution.
Quoted text here. Click to load it

Re: opinions are OK
Quoted text here. Click to load it

There is an issue if FF1 is metastable on two successive clock edges
when the input is slowly rising as FF2 will toggle twice, which simple
design practice can solve, but there is no issue if FF1 is meta twice
with an input pulse at exactly the clock period (FF2 toggling twice is
what the input has done). Other than that I haven't understood you.

Re: Original (5V) Xilinx Spartan ?
Sorry, Luiz. I can handle German, French, Italian and Scandinavian.
But Spanish and Portuguese are not my forte...
What is it you want to discuss?
Peter

Luiz Carlos wrote:
Quoted text here. Click to load it

Re: Original (5V) Xilinx Spartan ?
Hi Peter,

Quoted text here. Click to load it

Amazing!

Quoted text here. Click to load it

What a pity!

Quoted text here. Click to load it

I just said that, if you want, we can recapitulate the metastability
stuff. During your absence we found the cure for metastability (Rick
designed the circuit), we built, tested and patented a perpetual
motion machine (Ray worked a lot here) and finally, we almost finished
a prototype of a Time Machine. We are preparing a Dinosaur Hunt, do
you want to join us?

Now, coming back to Earth.
You said (@ Thinking out loud about metastability): "I have never seen
strange levels or oscillations ( well, 25 years ago we had TTL
oscillations). Metastability just affects the delay on the Q output."
But Philip Freidin showed (@ Mitigating metastability) some pictures
of the FF output during metastability that disagree.
Do the Xilinx FFs have a different behavior?

One more question.
You also said (@Thinking out loud about metastability): "Remember:
Metastability causes an extra 3 ns of unpredictable delay once in a
billion years...  Seems to be an affordable risk.".
What kind of input? What clock frequency?

Luiz Carlos

Re: Original (5V) Xilinx Spartan ?
Peter,

Forget my last question, I saw your post at "opinions are OK".

Luiz Carlos

Re: Original (5V) Xilinx Spartan ?



Quoted text here. Click to load it

I have a lot of respect for Phil, we are personal friends and have
worked together for over 20 years. I think he used old TTL pictures.
Quoted text here. Click to load it

300 MHz clock, ~50 MHz data, Virtex-IIPro. See TechXclusive on the
Xilinx web.

Peter

Site Timeline