Influence of temperature and manufacturing to propagation delay

- T
- Thomas Reinemann
  
  Contact options for registered users
posted
17 years ago

Tue, Nov 14, 2006 6:33 PM

Hi,

we are running in trouble with our curent design for a Xilinx Spartan 3 xc3s1500.

It does signal processing and it seems that sample got lost with increasing temperature. Immediately after power on all works well, some minutes later, if final temperatures is reach, some samples are missed. I hadn't a thermometer ready, but I can always touch the FPGA for a long time, it may be 50=B0C. It runs with a clock of 76.8 MHz, PAR states a maximum frequency of

78.777MHz, and logic utilization is about 60%.

One board works as expected and two other show the explained effect, the boards have the same layout but are made by different manufacturers. At least the not working are lead free.

Just now, we had a discusion to the influence of temperature to propagation delay. I don't believe that it influences clock lines and other logic resources in a (big) different way. Is It true or not?

I read the thread "Propagation delay sensitivity to temperature, voltage, and manufacturing", but the answers are very related to DCMs.

Tom

- P
- PeteS
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 6:43 PM

Every IO buffer and SRAM device I have ever worked with is sensitive to temperature. (So is just about everything else electronic for that matter)

As the device (more strictly the cell) heats up, the propagation delay _and_ the rise/fall times _will_ deteriorate.

What is the core temperature rising to? My post-PAR tools give guaranteed timing across temperature (assuming you set them up that way), including self-heating (which you have to plug in yourself).

At that speed, I would assume the core (or at least those parts toggling at that rate) to be at least 25C above ambient.

The major FPGA mfrs provide thermal analysis tools to predict the power dissipation and temperature rise of their devices - have you used those and then plugged those numbers into post-PAR static analysis?

Cheers

PeteS

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 6:46 PM

Are you transferring data between related time domains either with or without a DCM? Your FPGA and software version could have a slight impact on results.

we are running in trouble with our curent design for a Xilinx Spartan 3 xc3s1500.

It does signal processing and it seems that sample got lost with increasing temperature. Immediately after power on all works well, some minutes later, if final temperatures is reach, some samples are missed. I hadn't a thermometer ready, but I can always touch the FPGA for a long time, it may be 50°C. It runs with a clock of 76.8 MHz, PAR states a maximum frequency of

78.777MHz, and logic utilization is about 60%.

One board works as expected and two other show the explained effect, the boards have the same layout but are made by different manufacturers. At least the not working are lead free.

Just now, we had a discusion to the influence of temperature to propagation delay. I don't believe that it influences clock lines and other logic resources in a (big) different way. Is It true or not?

I read the thread "Propagation delay sensitivity to temperature, voltage, and manufacturing", but the answers are very related to DCMs.

Tom

- J
- John Adair
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 9:15 PM

Thomas

Try a rebuild with a tighter frequency. The tools will tend to just give you what you ask for, not anything much better. There are also some versions of ISE (i.e. the speed tables) that seem to be a bit marginal with designs if you are close to the limit like you are. On a similar vein check you have the latest version and service pack of ISE.

Other things to watch when the unexpected happens are the power supplies and whether your clock source is stable. Ensure they don't dip or suffer significant noise (decoupling and power plane strategy) particularly if you are using a DCM. If using a DCM do have a good look at the Vccaux.

John Adair Enterpo> Hi,

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 10:22 PM

Thomas,

Comments below,

Austin

-snip-

How much slack does your timing report say it has?

I can infer that the slack is 327 ps (1/78.777 MHz - 1/76.8 MHz). That is pretty darned small. If you have 400 ps of jitter on your clock, you now have 327 - 1/2 (400) = 127 ps of slack ...

If you have 800 ps of jitter, then you are failing (slack less than 0).

If you have any paths that were not constrained properly, then 327 ps of slack is a fiction, you really may have paths that are failing to meet timing.

Parts will vary: some will be faster than specified. none will be slower than specified.

Temperature affects all delays. The resource affected may be different, and may be affected more, or less, but they will all slow down at hotter temperatures.

DCM's vary, too. So does everything else. Peter has a rule of thumb, or 0.3% per degree C slowdown, but it is a rule of thumb, not anything we characterize or guarantee.

You are just too close to the edge: go back and review your constraints (to see if they cover all the critical paths), and perhaps apply a smaller clock period as a timing specification.

- P
- PeteS
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 10:43 PM

Now come, Austin. If the tool tells me I have positive margin (however small) I expect that to be true. I've done designs where I calculated the worst case and had a _guaranteed_ margin of 8 ps. Note the word guaranteed. I thought the post-PAR analysis tools gave me guaranteed timings.

Lead free processes stress a part more than the non-lead free; there is also the issue that the joints may not have the same highspeed performance; your application is highspeed enough to be susceptible to soldering issues. However, I agree with Austin that your margin is sorta small.

I would also check to see if you are adding the signal rise/fall (which can get quite high for non-rocket IO pins) as temperatures increase.

Cheers

PeteS

- P
- PeteS
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 11:03 PM

You are obviously clocking things from somewhere. A margin of 300 ps or so can get lost in the rise/fall times of a hot clock source. Have you characterised your clock inputs properly for post-PAR analysis?

Cheers

PeteS

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 11:15 PM

PeteS,

See below,

Austin

-snip-

Sure. For whatever you constrained. Did everything get constrained properly? This is not a trivial task: verifying timing closure may sometimes take up huge amounts of time (to verify every critical path has sufficient slack, and was properly constrained). And 8ps of margin means that if you have 16 ps P-P of jitter, you have no slack left, and errors may occur if the jitter increases by even one ps...

But if you have a clock with 400 ps P-P of jitter (not uncommon), and you have lots of IOs switching, and you have power supply variations, and so on, you might be at the edge, or over the edge. (Probably are...as was obvious in this case)

Yes, it is. I recommend having at least the peak to peak worst case measured jitter as margin (slack). That means you have a factor of two for safety. Considering peak to peak jitter is unbounded (14 sigma enough? 16 sigma? 20 sigma?), you really do not want to cut things too close (or you will eventually fail in some process/voltage/temperature/jitter 'corner'.

True, if the rise time is long, then you probably also have a lot of jitter (as a long slow rise time leads to imprecise transitions).

- P
- PeteS
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 11:23 PM

Inline

Austin Lesea wrote: > PeteS, >

No it isn't. I have spent days on end making sure I have considered everything involved.

I am anal about my constraints. I've done really high speed stuff (10Gb/s) but even the marginal things (DDR above 200/400 comes to mind) will bite you if you haven't properly constrained the design. The last time I did that I even added the delays (and filter effect) of the bond wires from the pad to the die. Perhaps the OP is not so anal about it but needs to be. My design worked first time for that, for what it's worth.

Deterministic jitter will kill a design, but it's predictable (see my comments about being anal). PCB tracks can add enough jitter to swamp

300ps of margin, especially if there are vias involved. There are techniques to alleviate this, of course (maybe I should publish how I did it ;) One can not, however, eliminate it.

That's my point. A FF will clock at some point in the active region. The indeterminacy of this region *must* be added to the error budget if you want guaranteed timing.

Cheers

PeteS

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 14, 2006 11:58 PM

Thanks for throwing this back in after your "Now come, Austin" comment. The tools will report what the chip will *absolutely* support under worst case conditions as long as none of the input conditions for that specification are exceeded. Maximum acceptable jitter is specified in the data sheet but must also be considered *internal* to the device since a poor set of switching I/Os and/or improperly bypassed and distributed rails can affect the amount of jitter seen by the time it gets to the global clock routing.

- John_H

- J
- Jeremy Stringer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 12:21 AM

As far as I am aware, unless you punch it in, the tools won't take account of your input clock jitter, which can get quite large - I think that was the point?

The tools have to make some assumptions somewhere - they don't have the information :)

Jeremy

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 2:56 AM

That is true provided you have a jitter-free clock. The tools do not know, nor can they predict how much jitter is on your clock. You need to consider a jitter margin in your clock constraints, as any jitter erodes the minimum clock period. Keep in mind that you not only have the jitter introduced by the DCMs if you use them, but also jitter inherent in your clock source plus jitter added by noise on the board and more importantly by modulation of the VCC-IOs of the clock pins by other pins switching on the bank or by fluctuations in the power rails. If you subtract your cycle-to-cycle max jitter from the clock constraint, then you wind up with guaranteed operation.

In a lab environment, you can usually get away with ignoring the jitter, as you usually won't be anywhere near the slow corner of voltage, process and temperature. In the field though, not allowing for sufficient jitter tolerance is likely to come back and bite you hard in the shorts.

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 2:12 PM

Keep in mind that those timing specs are only good to some sigma, probably three, which means that there are some (very very few) devices that get shipped that don't meet their timing. Xilinx warrants those devices with free replacments (labor not included!), provided you can prove that their part is not meeting its specs. My point is, you need to have a reasonable margin to cover those one-in-a-million parts, or you need to be prepared to replace them when they fail.

Since you have two devices that fail, and one that does not, I don't think this is _your_ issue. I suspect that clock jitter is not being completely/correctly taken into account. However the issue of commercial component specifications is real, and leads to significant derating in high reliability applications (military, space, medical, etc.)

Andy

Ray Andraka wrote:

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 2:33 PM

Are you making up statistics (probably 3 sigma) or do you actually have direct experience or documentation to back up this rather bizarre claim?

I have trouble with engineers who decide "this is the way things are" without a shred of proof. I hear it almost daily.

- John_H

Andy wrote:

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 4:23 PM

John,

I, too, have a problem with people making assumptions about our product quality.

If you are interested, we do publish what our criteria are, and the probability that a part is a test escape of some sort, or fails upon first insertion, etc. is something we do document, and care deeply about.

formatting link

Obviously, we strive like most companies for a '0 defect' goal, and like all companies, we somehow are unable to ship only perfect components (funny how the real world conspires against perfection).

Since every bitstream is different for each application, and we don't know any of them, it makes assuring 100% perfection a daunting task, yet one that we willingly accept and strive towards.

In fact, if you really want a component that is absolutely best tested for exactly your bitstream (design), then you should be using the EasyPath(tm) program, as that program has a customer program for the FPGA that exercises the paths and logic that you actually are going to depend on (based on your design).

Austin

- P
- PeteS
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 8:09 PM

I try to be fair, and in defence of Xilinx, I will say that the tools are accurate when properly set up, and when all constraints are indeed properly set, but that leaves a huge hole in the analysis, sometimes known as the external circuitry and circuit board.

A single via can induce 50ps or more of deterministic jitter on a highspeed line, and we won't even get into proper termination or the dozens of other gotchas in timing budget analysis.

My point would simply be that I have to be able to trust the tools to give me the information I need to do a thorough timing budget analysis; but simply because one part of the design is supposed to work does not relieve me of the responsibility of making sure it all works together ;)

Cheers

PeteS

- P
- PeteS
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 8:23 PM

In further defence of IC manufacturers and in particular FPGA vendors, I have _never_ had a failure due to FPGA timing with Xilinx (and others) parts except for my own failure to properly analyse the system.

I have designed with others, but Quicklogic was 12 years ago and any issues from then would be unfiar, to say the least. I've also used Lattice and Altera parts and provided I used the tools properly, they worked as expected as well.

The key (as I note elsewhere) is that provided I get guaranteed characteristics, (which I _do_ insist on, whatever they may be) I can use them as part of an overall budget; and it is _my_ responsibility to make sure I give the tools sufficient information to give me the information I need, to say nothing of taking all external effects into account.

I wouldn't normally buy parts that are statistically characterised (AQL)

*unless* it's a mature part with a track record of not having issues on the particular tests that are AQL only.

Both the vendor and designer has responsibilities, and although I expect quite a bit from the vendor, I am perfectly willing to assume my part of those responsibilities.

Cheers

PeteS

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 15, 2006 11:23 PM

I do not have direct numbers for any supplier (that I can share anyway), but the "norm" for commercial electronics components is between 3 and 6, with very few suppliers close to 6. This is from a military reliability conference a few years ago at which a colleague attended/presented.

My point is not what the exact quality level of any supplier's component is, it is that none are at 100.0000%, and you need margin to account for that. If y'all want to miss the forest for the trees arguing over exactly what level of quality a specific supplier delivers, go ahead...

Andy

Aust> John,

- T
- Thomas Reinemann
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Nov 16, 2006 10:40 AM

PeteS schrieb:

Yes, four FPGAs shall work synchronous, therefore an external PLL comes into operation. The clock frequency has been characterized within the ucf.

I solved the problem, simply by increasing this characterized frequency.

Is there a guide line, to avoid such problems before they emerge? Where can I specify the jitter of an external clock source?

Tom

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Nov 16, 2006 2:14 PM

I never said that the IC manufacuterers don't _guarantee_ their specifications on every device they ship! But unless you are buying really simple ICs, and/or paying lots of extra $, every single device is not 100.000% _tested_ to meet every one of those specs. The vendor still guarantees that every device will meet every spec, and will replace a defective device for free. Xilinx and others do a fantastic job of testing as many of the specs, over as much of the device, as they can, but they (and we) cannot afford 100.000% testing over every spec on every device delivered. Austin even states this. So they use _statistics_ to cover the untested specifications/devices, to make sure that the probability that you will receive a bad device is acceptably small. This is not news, folks.

I have never had an FPGA fail due to not meeting its specifications either. But I design in margin to make sure, again to a statistical level, that they don't fail.

Andy

PeteS wrote: