Xilinx input path: Why does the optional delay element with inputFF help me?

- P
- Preben Holm
  
  Contact options for registered users
posted
18 years ago

Sun, May 1, 2005 9:33 AM

Hi everyone

I had a question earlier regarding some timing-issues, but I came to the question how the optional delay element with an input flip-flop was able to help me?

What I don't understand is - how the extra delay can help me? Even more delay causes bad timing? But combining this with an input flip flop should help?

I only see this input flip flop helps me, if it's controlled by another clock? I tried searching Xilinx and google around but nothing but the same texts appears in several explanations other than it simply helps? And still - why is a delay good when having the flip-flop?

I think that the 1ns (Spartan 3) non-obtional delay is before the input flip-flop and not after it - or am I wrong about that?

Can somebody please explain how this input-path really works and not just that it helps - cause I cannot personally accept that it helps if I don't know why?

Thanks, Preben

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, May 1, 2005 3:38 PM

Preben. the extra delay is there to avoid a hold-tme requirement on the input data. Here is how: When you use a global clock to clock in the data, that clock has to be able to drive thousands of destinations. That means a "significant" clock delay, more than the data delay to the input flip-flop. Whenever the clock delay is longer than the data delay, there is a hold-time requirement: Assume a perfect flip-flop with zeo set-up time. The data must then be held valid for some time after the clock. This is ugly, since the data source sees no reason to keep the old data valid after the new clock. Increasing the data delay to make it match the clock delay reduces the hold-time requirement, hopefully to zero or a negative value. The extra delay increases the set-up time, and thus reduces the max clock rate a little. But it avoids failure when the data source isvery fast.

"Better to sacrifice some top speed, but avoid unreliable operation at all (even slow) clock rates". Peter Alfke, Xilinx Applications (from home)

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, May 1, 2005 3:46 PM

The data must then be held valid for some time after the INCOMING EDGE ON THE clock PIN.

Just to avoid confusion.

Peter

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, May 1, 2005 3:46 PM

The data must then be held valid for some time after the INCOMING EDGE ON THE clock PIN.

Just to avoid confusion.

Peter

- P
- Preben Holm
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, May 3, 2005 7:21 AM

This clock-delay - is that what Xilinx ISE calls "clock skew"? Often there is "no" delay described unless you route the clock (not through global nets)

And there is a but: the input flip-flop (IFF) is before this obtional delay element? Should the IFF be driven from another clock-source (some external)?

Well, in my case, the A/D-converter holds the data valid 2ns after the clock, and only 3ns before the clock (running 100MHz).

But this delay also delays the "valid-from" time, so if data isn't any longer valid after this extra delay (on the rising edge) then to use the extra delay-element will not help (make it worse)

And I still need to know - why is the IFF so special? Isn't just a flip-flop without the use of the obtional delay-element?

Thanks Preben

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, May 3, 2005 5:26 PM

Preben, here again, slowly: The input flip-flop is different, since it gets its data from the outside, not created by the internal clock.

The delay from the chip's clock input to the clock arriving at the input flip-flop (or any other flip-flop) is much longer than the delay between Data input to arriving at the D input of the input flip-flop. The clock must drive thousands of destiantions, the data only one.

Whenever the clock arrives later than the data, data must be held (hold time) which is an ugly parameter that most flp-flops do not specify.

That's why it helps to delay the data input, and Xilinx has done that for the past 18 years on all our parts, but in Virtex-4 it has become much more sophisticated. Peter Alfke

- P
- Preben Holm
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, May 3, 2005 8:08 PM

And this is the only difference? So it get's instantiated automatically when writing a process and this process is taking input from the "outside" of the FPGA?

But can a "minimum" delay of the clock be guaranteed - all these parameters is based on maximum delay of clock, maximum delay of input-buffer and so on.. But in my case I find the minimum a rather usefal value, since the data is valid after 7ns (3ns before rising edge, when 100MHz) and delaying this a 5ns total is bad for me if the internal clock isn't delayed more than 2ns (the data will not be ready to the rising edge of the clock). So a minimum 2ns delay is a very useful value whether or not to use the delay element (it works fine in my test-application, but only 50MHz testing and supposed to run 100MHz in the end).

Okay, I'm very new to FPGA's and is in the learning part - I just learned about VHDL a few years ago, and now I'm really trying to learn about the FPGA's..

Xilinx has very nice XAPPs for many things, but this is one place where explanations is missing.

Thanks Preben Holm

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, May 3, 2005 9:46 PM

Preben, it is best to look at the I/O from a hardware point of vies( and not talk about "writing a process" etc. You point out correctly that the min clock delay can cost you performance. Here is something I wrote a month ago, and that is also covered in XAPP702. It allows you to compensate for many variable delays.

Capturing the Input Data Valid Window.

Let's assume a continuously running clock and a 16-wide data input bus. Let's assume the clock is source-synchronous, i.e. its rising transition is aligned with the data transitions, and all these transitions have little skew. (Preben, in your case you know thatthe data is valid 3 ns before to 2 ns after the clock edge. So we might call it "aligned with the clock" If you do not want to make that assumption,you are in for a more complex training data pattern, but it could be done, if absolutely necessary, not needed in your case).

The user faces the problem of aligning the clock with respect to the data in such a way that set-up- and hold-time specs are obeyed and (hopefully) data is captured close to the center of the data valid window. Given the fairly wide spread between worst-case set-up- and hold-time as specified by the IC manufacturer, a carefully worst-cased design will achieve only modest performance, since the designer is forced to accomodate the specified extreme set-up and hold time values of the input capture flip-flops. Typical values are positive 300 ps set-up time, negative 100 ps hold time, which implies a 200 ps window. The actual capture window is only a small fraction of a picosecond, but, depending on temperature, supply voltage or device processing, it might be positioned anywhere inside the specified wide window.

Here is a self-calibrating design approach that achieves much better performance by largely eliminating the uncertainty of the flip-flop characteristics.

This approach assumes reasonable tracking of the input flip-flops driven by the data and clock inputs, and assumes programmable delay elements at each input buffer.

The incoming clock is buffered and used to clock all data input flip-flops. The incoming clock is also used as if it were data, run through its own delay element X, then driving the D input of a clocked flip-flop. Its output is then used to control a state machine that manipulates X to find the two edges of the valid window, where the flip-flop output changes. Note that changing X has no impact on the bus data capture operation, it only affects the control flip-flop. Once both edges are found, the state machine calculates the center value, and applies this in common to all data input delays.

This auto-calibration circuit can run continuously (or non-continuously), since it does not interfere with normal operation. It means that the user can completely ignore the flip-flop set-up and hold time specifications, the spread between set-up and hold-times, and their possible variation with temperature and Vcc. This circuit does not compensate for skew between data lines, or any skew between data and clock, and it assumes good tracking between all input flip-flops, and relies on a reasonably fine granularity in the delay adjustments. Fundamentally, this auto-calibration reduces the data capture uncertainty from a first-order problem, to a second order issue, thus permitting substantially higher data rates and/or higher reliability of operation. Virtex-4 programmable input delays have 75 picosecond granularity. A low-skew data bus can thus be captured at bus data rates in excess of

1Gbps, even when the data valid window is smaller than 200 ps. Peter Alfke, 3-31-05

- E
- Eric Crabill
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 4, 2005 5:42 PM

Hi Preben,

This post is not only a reply to your question, but also my first try using Outlook as a newsreader. If the formatting is odd, please forgive the mess! I'm eager to see what this looks like when it's posted... Your question leads me to belive you are not familiar with the actual analog behavior of a flip flop. I have some information, complete with famous quotes, located at:

formatting link

Hopefully this will give you a bit of background. If you are using a flip flop, you need to make sure you satisfiy the input setup and input hold requirements of the flip flop -- otherwise, the flip flop won't work like you expect! If you have a flip flop, with some parameters Tsu, Th, and Tq, you might visualize that:

Adding delay on the output path increases Tq, but leaves Tsu and Th unchanged.
Adding delay on the input path increases Tsu, reduces Th, and leaves Tq unchanged.
Adding delay on the clock path decreases Tsu, increases Th, and increases Tq.

Xilinx provides input delay mechanisms so that you can implement option #2, above. It is also the case that by way of a digital clock manager, you can arrive at implementations of #3, above -- or the complement of #3, above, where you effectively remove delay on the clock path and achieve the opposite effect.

Your original question, I believe, is "Why?". People use these features to modify the I/O timing -- Tq, Tsu, and Th of the input and output flip flops. "Why?" To achiveve a specific I/O timing, required by design, or required by specification. Or, to simply make life easier...

Let me give you a specific example of how you might apply these techniques to make life easier. Let's say you are doing a fully synchronous design, like a PCI bus. In general, to ensure proper operation of the bus, you need to not only analyze the internal behavior of each chip, but how they interface to each other -- synchronously, with a distributed reference clock. You need to satisfy the setup and hold requirements of the input flip flops of each chip. There are two sets of calculations that are required. One, to check the input setup. Two, to check the input hold.

If you look at the equations in the presentation I referenced above, you will see that you can construct something that is guaranteed to satisfy the hold time requirements -- by increasing the input delay, to get a zero (or slightly negative) hold time -- thereby eliminating the need to even check it. A sum of real physical delays is always going to be greater than zero (or slightly negative). That leaves you only needing to check the input setup.

In fact, this idea is applied inside Xilinx FPGAs as well. All of the flip flops in the FPGA are designed to have zero (or slightly negative) hold time, when clocked with a single global clock signal. This ensures a fully synchronous design with one clock will not have hold violations on any flip flops.

Hope that helps, Eric