FPGA and metastability once again

- V
- v_mirgorodsky
  
  Contact options for registered users
posted
18 years ago

Wed, Nov 23, 2005 9:38 AM

Hello ALL,

I have a design with two global clocks. I have data I need to transfer from one clock domain to another. I am aware of existence of FIFO blocks :), but it seems to be too expensive to spend a block-ram and other resources for every boundary crossing. To avoid using FIFO blocks we created a handshake schematic, based on some triggers and small FSM. This solution is proven to work in hardware error-free for almost 40 hours. First domain clock frequency is 25MHz or 125MHz (depending on mode); second domain clock frequency is 166MHz.

Naturally, some triggers in out design are metastable. Is it possible to get some intermediate voltage level at the output of trigger in FPGA if input signal on its Data input violates setup or hold times? In my design I assume I don't get any intermediate level voltages at the trigger outputs. What about signal I input into FPGA from outside? Is it possible to get some intermediate voltage levels on the trigger outputs by violating setup-hold times and/or IO standard voltage levels?

With best regards, Vladimir S. Mirgorodsky

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Nov 23, 2005 4:39 PM

Vladimir, at the frequencies mentioned, the decision-making flip-flop will go metastable, which means it will occasionally have an unpredictable extra delay. It will have an extra 2.0 ns delay statistically once every 2 weeks. It will have an extra 2.5 ns extra delay roughly once every 20 000 years. For every extra 100 ps additional delay, the mean-time-between-failure (MTBF) increases by more than a factor 10. See my Xilinx app note XAPP094 for more details. Don't worry about strange levels. It's the unpredictable extra delay that is the problem. Peter Alfke, Xilinx Applications

- A
- Avrum
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Nov 23, 2005 8:50 PM

Since you haven't posted any details about how you are doing the clock crossing, we can't say if the system is robust or not. Dealing with clock crossing is a tricky issue, not just in terms of dealing with metastability, but also in terms of how to ensure that data is not missed, replicated, or otherwise mangled. 40 hours of testing is relatively short when you are dealing with metastability; all you have demonstrated is that your MTBF is not substantially less than 40hrs; it is certainly possible that you could have a system that works most of the time, but still fails once per day, once per week, once per year...

Clock crossing through a FIFO is a mechanism that is well understood, and can be designed to be reliable in spite of metastability issues. Other mechanisms certainly exist, and, depending on the frequencies involved, can be implemented with less logic than a FIFO. However from 125MHz to 166MHz, many of the "easy" robust mechanisms won't work; a FIFO is your best bet.

It is understandable that you don't want to "waste" block RAMs for the clock crossing. However, one nice features of the Xilinx architecture is the availability of distributed RAMs. It is relatively "cheap" to use these dual ported RAMs to build 16 word deep FIFOs (in any width), which are enough for most clock crossing applications.

Avrum

- P
- Philip Freidin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Nov 24, 2005 8:26 AM

Hi Vladimir,

40 hours of testing is not nearly enough for almost any system. You desperately need to read this:

formatting link

to get a better understanding of the issues. Since you don't say what FPGAs you are using, I can't guide you to specific numbers, but there is significant information at the Xilinx web site that describes the metastable parameters for various product families.

For example, XAPP094 and

formatting link

There were also some Xcell articles I believe.

You need to use this data plus a careful analysis of your design and its post place and route timing and cycle margin, to calculate Mean Time Between Failures (MTBF). For example, Peter has given you some numbers in his answer to your question, but he has assumed that you are using very recent products, and has not made clear that the metastable characteristics depend on the device you are using.

This is why you need to read the FAQ article above. Metastables can give many possible anomalous types of behavior, depending on the actual circuit design. You can get metastables that transition cleanly between logic 0 and logic 1. The metastable behavior is that it occurs later than Tcko MAX. (greater than the clock-to-output max delay for the Flip Flop, taken from the data sheet)

As Peter said in his response:

"Don't worry about strange levels. It's the unpredictable extra delay that is the problem."

This doesn't mean that strange levels don't occur, but the end result of metastables regardless of their signal levels while getting to a final stable value is that the time to get to the stable value is unbounded.

It is not clear what you mean by "triggers". When a signal goes metastable, it can do many strange things: oscillate, intermediate levels, normal levels, multiple transitions, ...

Anything that violates setup and hold to clock can cause metastables.

Cheers, Philip

Philip Freidin Fliptronics

- V
- v_mirgorodsky
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Nov 24, 2005 9:08 AM

Hello Avrum,

Our design is very simple. We have a register where the actual data is stored; we have a strobe register, which assert only one cycle later after the main command register has been filled with valid information. This way data is always stable at the command register at least one full period of 125MHz clock cycle. Than the value of strobe latched by another metastable trigger, whose clock is 166MHz. The system does not transition to any state until it sees the strobe in its 166MHz clock domain. All other stuff is really simple. We just generate couple clears for both strobe registers and couple enables for both domains data registers. Since we transfer only commands across the clock boundary, we don't need very high throughput, although the design is able to transfer a word of data across the clock boundary every three clock cycles of slower clock.

If we miss a strobe this clock cycle, we will recognize it later. Our system can easily tolerate that since we have a command pipeline and we don't need to supply a new command very frequently.

This whole system depends on assumption that trigger can not have metastable (intermediate) voltage level at its output, when its input violates either setup or hold times or both. I heard that ancient discrete triggers had that problem. They may generate intermediate voltage levels on their outputs when their input voltage level is intermediate and I am curious if this still the case for modern FPGA.

With best regards, Vladimir S. Mirgorodsky

Avrum wrote:

- A
- Avrum
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Nov 24, 2005 1:46 PM

It sounds like what you have is workable; the key is the relatively low throughput through the clock crossing.

However, you do have to worry about metastability and its effects in a system like this.

What I would do is enable the strobe on the 125MHz domain on the same clock that the command is loaded into the flops on the 125MHz domain. Then bring this strobe across to the 166MHz domain using TWO flip flops back to back with a constraint on the wire between the Q of the first and the D of the second. This constraint should artificially constrain the propagation delay between these two points to be significantly less than the 6ns clock period of the 166MHz clock; say 2ns to give 4ns for metastability resolution (as Peter was talking about).

Then you can use this signal to enable the FFs on the 166MHz domain to sample the data being held in the FFs on the 125MHz domain.

If the data is guaranteed to be stable in the 125MHz domain for "long enough" (enough to get the strobe synchronized into the 166MHz domain and the data latched), then you don't necessarily need a handshake mechanism back to the 125MHz domain; you can edge detect the strobe in the 166MHz domain and only load the 166MHz FFs on the first clock when the strobe is high.

If you do need to handshake back to the 125MHz domain, then you will need to the same thing in the opposite direction; double synchronize the strobe going back and place a constraint on the wire between the two flops. Remember, that a pulse on the 166MHz is not guaranteed to be seen in the

125MHz domain; in general, you can only catch a pulse in a different clock domain if the period of your source clock is larger (and I would say significantly larger) than the period of your destination clock plus the setup time and the hold time of the flip flop on the destination domain. If this is not the case, then you can stretch the pulse in the source domain to cover more than one clock period. However, if you do this, you must do it so that the stretched pulse is the output of a flipflop, and not (say) the output of any combinational gate.

With all the synchronizing required, I expect that the throughput of this system will be somewhat less than one command every three clocks in the

125MHz domain. If you can tolerate the slower throughput, then this will work. If not, then you will have to use a FIFO!

Avrum

clock

metastability,

or

is

could

and

can

166MHz,

bet.

clock

dual

for

blocks

FSM.

FPGA