Signal Integrity, ground bounce, crosstalk, SSOs, BGA pin-outs, parasitic inductance...

- P
- Peter Alfke
  
  Contact options for registered users
posted
19 years ago

Tue, Mar 1, 2005 3:14 AM

Those are the subjets of the next webseminar. Over 700 engineers have signed up already. You can still join us this Tuesday (11:00 Pacific Time) when Dr. Howard

Johnson explains the effects of ground bounce and crosstalk caused by simultaneously switching outputs. This is a highly technical talk (you will love it) with many screen shots taken with an 8 GHz scope, and with detailed comparisons of good and bad BGA packages.

I give the short introduction and conclusion, but it is Howard's show. You may know him as the author of "High-speed Digital Design", the standard reference book found on many of our bookshelves (and benches).

Howard is not only a well-known and respected expert in this treacherous field, he is also a lively speaker and an excellent teacher. Enjoy ! You can register for this live webseminar, and also for the two archived predecessors:

formatting link

Peter Alfke, Xilinx Applications

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 3:53 AM

I think Howard Johnson gave a super presentation, definitely "technical". We had 800 listeners, and we got lots of questions. From my point of view, it was a smashing success. The newsgroup is still very quiet, maybe that is better than the biting and bitching after the prior seminars ... Coincidentally (?) Altera had a press release about Signal Integrity on the same day. Their David Greenfield (of comp.arch.fpga fame...) claims that "benchmarks demonstrate another significant advantage...", but their press release tells us that Altera's benchmark claims come from simulations based on IBIS models.

On our board, we measured and compared the real hardware, for both families, under identical condition. I prefer an 8 GHz Tektronix scope picture over an IBIS-based simulation any time of the day. More work, more expensive, but definitely more believable and more encompassing. Peter Alfke, from home.

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 6:01 PM

Still no comments on the newsgroup. Those of you who want to (re)visit Howard Johnson's presentation can do this by clicking on

formatting link

Peter Alfke, Xilinx Applications

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 6:33 PM

Peter,

I suppose they are still digesting Howards presentation....

Bottom line, the new SparseChevron(tm) package for V4 has substantial benefits which enable the high speed designs to work.

It doesn't help to have better dI/dt if all it does is create SI problems. Perhaps we design our IOs to meet requirements, such that they can actually be used? (Answer: yes we do. It is part of the requirements not to 'blow the lid off' with dI/dt that would be of no benefit to a customer).

The key is what is going to happen when all the IOs (and CLBs) in your application do what you want them to do: will the noise trash the IO? will the noise cause so much jitter during the "boing" (I love HJ's sound effects...) that the timing margins are violated, and errors result?

Errors caused by system jitter (which in turn is caused by ground & supply bounce, which in turn is caused by SSOs and logic toggling) is the number one SI problem in high speed designs today (and not exclusive to FPGAs at all).

Aust> Still no comments on the newsgroup.

- F
- Falk Brunner
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 6:57 PM

"Peter Alfke" schrieb im Newsbeitrag news: snipped-for-privacy@f14g2000cwb.googlegroups.com...

Nevously waiting??

Another strike against Altera. Hmm, the differences are clearly visible, explinations sound reasonable. If we assume that PR had not too much possibilities to fake, aeehhh, arrange the data, this looks like a clear victory for Xilinx, does it?

Regards Falk

- J
- jaxato
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 7:56 PM

It was a good presentation, got to learn a couple of new thing, and have new ideas. Right now, I have a question though, about packaging and interconnection. I was just looking at the old AFX prototype board for the venerable XCV series of FPGA, and then I ask myself that question of how does those P4 chip actually resolve their SI issues, without using BGA and being placed on sockets! considering they draw ~50W (i might be wrong here, but its a lot of current...); wouldnt this introduce parasitic inductance, hence ground loops and eventually noise induce in "victims" pins?

thanks Jacques

- I
- IgI
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 8:09 PM

Hi!

I really enjoyed the presentation. I didn't realize how fast the time passed and before I was able to write down all my questions it was all over. So next time reserve at least 1/2 hour for QA session. Besides the comparison between Virtex-4 and Stratix-2 I was hoping to see comparison between Virtex-II/Pro and Virtex-4.

Regards, Igor Bizjak

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 9:03 PM

There was no hanky-panky. We designed the dual-board to be as good as possible, and strictly identical (or fair) on both halves. And Howard did his analysis with no pre-conceived answers. Marketing was completely out of the loop (doesn't always work that way) and Howard Johnson has obviously too much invested in his own reputation to even think of risking that on any shady deals... This was pure science, and a fun project. That the results are heavily in our favor is a just reward for having designed things the proper way. Peter Alfke

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 9:05 PM

Jacques,

A socket makes things worse, but only by the dimension of the distance as a ratio to the total distance.

So, for HJ, he assumed a 0.035" trip into the pcb, and a smaller number for the package (because packages are thinner).

Add another ? inches (or cm, mm, or whatever) and that makes the vertical loop larger, and will increase the noise in the proportion of the added distance. Yes, a socket makes it worse.

That is why there are (very expensive) low profile sockets (to reduce the inductance of the socket path).

Sockets just make everybody worse. If the toal loop is small to start with, then doubling it doubles the noise.

If the loop was terrible to start with, adding the same distance to it as before hardly makes anything worse.

One thing is true: once you start with a good package, you can make it worse (with a socket, or bad pcb layout).

It is also ture that once you start with a marginal or poor package, there is nothing at all you can do to make it better (and hardly anything makes it worse because it is so bad already).

A comment that was made, but HJ couldn't get to answer was: "If you use virtual grounds (IOs as ground), won't that improve the noise (make it smaller)." The answer is yes, it will. But, using IOs as grounds is not as effective as a real ground, and any part that uses IOs as grounds improves (does not uniquely apply to a bad package only).

Prior to the V4 packages, we had made as many improvements as we could have made at that time, knowing what we knew. Those V2, V2P packages don't fare (all that) poorly in comparison to the new SparseChevron packages (worse by ~2:1 in terms of inductance of loops), but are still better than other 'competitive' FPGA packages. As HJ said, it is all in the number of power and ground pins, their arrangement, and the bypassing on chip (to make power and ground pins equivalent from a signal return point of view).

Aust> It was a good presentation, got to learn a couple of new thing, and

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 9:10 PM

Igor,

We did that too. As I already posted, the difference was about 2:1 better with the new V4 packages.

Even our previous packages were not all that bad. But, they could be improved upon, so we did for this generation.

For a first order comparison, look at the di/dt for V2P (about the same as V4), and the ball pattern. Count the average IO per power/ground pin ratio and examine placement to compare the two.

Ultimately, we have the SSO table to reflect the package capabilities, so if you are within those guidelines, there is no performance penalty expected. We also consider other factors in the tables (like system jitter, not just Vil(max)), so designers have succesful boards, and we do not get cases.

Aust> Hi!

- J
- jaxato
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Mar 2, 2005 9:34 PM

So, is there real black magic behing the socket of my athlon 1.8GHZ processor???

- B
- Brian Davis
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Mar 3, 2005 3:01 PM

I didn't catch the webcast, but looked at that BGA paper last night. It appears to be limited in scope to discussing package related output switching effects, which is not the whole story for high speed I/O.

Looking at the output waveforms shown in figure 20, my first reaction was that it clearly showed that Xilinx hasn't done much to improve their I/O cell capacitance [1] since V2.

And, from DS302, V4 Cin = 10 pF, identical to the V2 spec.

Meanwhile, the marketeering data rate has gone from "840 Mbps" for V2 to "1 Gbps" for V4.

Perhaps Dr. Johnson could proffer his honest opinion of a "1 Gbps" LVDS receiver with a Cin of 10 pF [2].

While the reduced output slew rate due to capacitive loading may be of marginal "benefit" for low speed I/O standards, the disadvantages of high I/O capacitance far outweigh the advantages, especially for parts whose I/O is marketed as 1 Gbps capable.

Since you have that spiffy board at hand, I'd love to see plots of the following:

A) X vs. A ICCO for the "Hammer Test" at several toggle rates

B1) X vs. A waveforms for a high speed single ended standard (xSTL) B2) X vs. A ICCO for a high speed single ended standard (xSTL)

C1) X vs. A waveforms for 1 Gbps differential LVDS C2) X vs. A ICCO for 1 Gbps differential LVDS

D) X vs. A differential TDR input waveforms into a DT termination at 100, 200, 500 ps input edge rates

What I'd expect to see from those plots, if the Altera I/O capacitance is really half that of the Xilinx part:

A) dynamic ICCO would increase faster with frequency for the Xilinx output driver

B) the output waveforms would look worse at higher speeds for the Xilinx driver

C) Differential output switching would mitigate the SSO package effects somewhat as compared to single ended switching at the same rate

D) input reflections would be worse for the Xilinx part

The last time I pointed out the impacts of high I/O capacitance in this forum [3], a certain overzealous Xilinx engineer flamed the thread into oblivion. Hopefully this thread will suffer a gentler fate, with rational technical discussion prevailing.

Brian a longtime (mostly) Xilinx user who wants to see better parts

[1] While I like the flexibility of the Xilinx general-purpose nearly-all-IOBs-have-LVDS capability, if Cin could be improved by having having some I/O banks without DCI or certain of the I/O standards, I'd still buy the parts.

[2] I'd be happy to quote a Cdiff instead, if someone could tell me where it is documented.

Ideally, the differential input model would include both the single ended shunt Cin values as well as a differential across-the-pair Cdiff, so I could model both the differential and common mode reflections.

If Cdiff is negligible, and the input waveform is purely differential, then Cdiff = 1/2 Cin, as Austin has argued before.

[3]

formatting link

or

formatting link

- L
- Ljubisa Bajic
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Mar 3, 2005 3:57 PM

Jaxato,

Your Athlon is not a very IO intensive chip, the only other chip it has to talk with is the northbridge and the bus that it uses to do this does not run overly fast (several hundred MHz, nowhere near fast serial IO speeds that are achievable using modern FPGA io's). Newer Athlon's use the hyper transport interface, which is more comparable in speed, but even in that interface there is a total of 16 differential IO's that are involved; it is vastly easier to ensure good signal integrity in such a small IO interface (even if all of the bits were simultaneously switching) then to do so for all of the IO banks of an FPGA. I guess that, in short, no "black magic" is neccessary for the current crop of cpu package designs.

Ljubisa Bajic

- F
- Falk Brunner
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Mar 3, 2005 6:40 PM

"Brian Davis" schrieb im Newsbeitrag news: snipped-for-privacy@l41g2000cwc.googlegroups.com...

I dont think that Xilinx does intentionally slow down its IOs by adding capacitance. I guess the control slew rate a little bit more clever (using intentionally slower transistors)

Regrds Falk

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Mar 3, 2005 6:50 PM

Brian,

Good posting. A few comments, below,

Aust> V4 SI: The package is thrilling, but the Cin is bleak

There is more to Cin that you are aware of. I'll go into that below.

By all means, no it was not. It was just one small (but critically important) aspect that limits the system performance.

Why should we do that? What is it about the Cout that is such a big deal? Driving the pcb trace, and the load at the other end swamps the intrinsic C of the pin in almost all cases.

To do what we do (which is more than the competitor), we need the silicon area. Silicon area = C.

Sure has. Works great. Eye diagrams look fantastic (on a real board).

I am sure he wioll answer that if asked in a fair and impartial way. Perhaps he will also point out that there is a lot more to the IO performance than just C?

Ho ho ho. That is funny. Take the problem of slew rate out of control, and try to case our C as BAD because it slows us down SO WE WORK? Ha ha ha. I am rolling on the floor. Be serious. The C is what it is. It does not limit performance in any way.

the

Bull-feathers. We work great. Altera works great, too (on a few pins, without anything else switching).

We can do that.

I will ask Mark to get some measurements of the Icco for the "hammer" test between the two boards.

But something tells me that with the dI/dt being DOUBLE in the S2, you might not be so happy to see the results (again simple pin C is NOT the whole story for total power).

If all the power is in the pin C, perhaps you will see a 30% improvement. Again, we may be talking less than 6 milliwatts per pin. Big advantage when the S2 won't work in a system.

Oh my, my 72 pin bus switching at 200 MHz with 2.5V has ~430 mW more power than an S2......but it WORKS!

Excuse me, the waveforms look fine. Excessive rise and fall times don't buy you anything but misery. HJ just proved that.

Yes, the C is half differentially.

Yes. But, since our termination is internal, and the driver is terminated, it doesn't matter.

Do the simulation, the eye the receiver sees is just fine. Reflected signal (small) is absorbed by the transmitter, and does not cause distortion in the receiver.

With good reason. You are not correct in assuming bad SI always results from pin C. If you terminate externally, I would agree with you.

Great. One customer who will accept non-uniform IO. Thanks. We'll keep that in mind if we ever get to where we have to do this to meet all specs and standards. SO far, we do, so we don't have to (have different IO pins).

Uh, last I looked at circuit theory, it is still C/2 for the diff pair. It also agrees with simulations (if you instantiate the V4 receiver, and compare it to a circuit model of the same thing).

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Mar 3, 2005 9:59 PM

Falk,

No, we do not intentionally slow them down by adding C, but we do take full advantage of the intrinsic C that is there, so we do not have to slow them down as much as we would have to otherwise.

And we most definitely slow them down, as to not slow them down results in an SI nightmare (like the one HJ described for the 2S60 1020 package).

Austin

Falk Brunner wrote:

- B
- Brian Davis
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Mar 4, 2005 2:17 PM

I didn't claim that the I/O capacitance was the only thing slowing the LVCMOS outputs down.

What I was attempting to point out was that high FPGA I/O capacitance can limit the performance for both inputs and outputs for high edge rate I/O standards.

If driving a high-C FPGA input from a fast LVDS or ECL driver, proper analysis and verification needs to be undertaken to assess the impact of the FPGA Cin, which is much larger than you'd find in a dedicated ECL or LVDS receiver.

( Particularly in a multidrop situation, which is created whenever you need to probe the lines in system for verification purposes. )

For a demonstration of how high C affects a fast output standard, look at figure 26 of XAPP-622: in order to forward a 622 MHz clock, an AC coupling kludge is needed because the V2 LVDS outputs don't swing far enough to properly cross at 622 MHz ( 1.2 Gbps ).

Brian

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Mar 4, 2005 4:16 PM

Brian,

Yes, it was not designed for 622 Mbs (Virtex 2).

The "kludge" as you call it, is not required for V4.

Same pin C.

So, yet again, pin C is not involved.

Besides, if I have 2X the dV/dt, guess what happens when I drive 1/2 the pin C?

Yes, math still works: I get exactly the same di/dt, which leads to the same reflection for both cases.

As long as the receiver is terminated inside the chip, and the transmitter is also terminated (ie the LVDS standard), small reflections at the receiver are absobred by the transmitter, and SI is fine.

In fact, simualting 1/2 C with 2X rise and fall times shows exactly the same reflections and issues as C and 1X rise and fall times .....

Non issue.

Austin

Brian Davis wrote:

- B
- Brian Davis
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Mar 5, 2005 1:29 PM

Austin,

Because 10 pF and 1 Gbps are a poor match.

Note that I used "I/O cell capacitance" in my post as I attempted to point out the impact on both inputs and outputs.

However, as the only parameter given in the V4 datasheets is called Cin, I wasn't consistent in that name usage.

Hereafter I shall attempt to use C to refer to the I/O structure capacitance, as applies to both inputs & outputs.

Not in my experience, particularly when dealing with connections from 'real' 1 Gbps logic FPGA

My heretical $0.02:

DCI = not worth the penalty of excess C

So ditch DCI, keep the DT terminators, and invent a controlled slew driver with low C for the LVCMOS-ish standards.

board).

Where in Xilinx's V4 documentation might one find these pictures and eye diagrams, including real world vs. simulated waveforms at the driver, receiver, and points in between ?

Those 1 Gbps and 10 pF numbers are straight from Xilinx's own V4 datasheet- I don't see how you can claim any partiality on my part for merely pointing out your own numbers.

When have I ever claimed that it is the only factor?

Particularly in a post where my lead-in paragraph ended with the phrase "... which is not the whole story for high speed I/O."

control,

ha

ROTFL right back at ya

I suggested this test as a quick way of verifying Altera's claims of improved C - given 500 switching outputs, a few points along the power vs switching rate curve should give us something to ponder.

I never said anything about what percentage of device power this would represent.

BTW, how many of those 500 outputs connect to PCB traces, how many only to a BGA solder pad?

Dr. J demonstrated that Xilinx's package is better.

He did not address the issue of whether the I/O capacitance of the V4 was amenable to 1 Gbps operation.

"Too Fast For the Package" is bad. "Just Fast Enough" is great. "Can't get out of my own way due to high C" is also bad.

As these are general purpose I/O, the case of multidrop as well as point-point needs to be considered, along with non-FPGA 1 Gbps drivers.

There's more to this one than just output C ( balanced driver ICCO; some degree of agressor cancellation ).

If you can repeat Figure 19 with 250 LVDS/LDT type pairs ( or as many as you can fit into both devices ), that would be an interesting comparison.

ROTFL yet again

I'll note here that, unlike the current V2/V4 material, the old Virtex-E LVDS application notes actually addressed the issues of C, reflections, and multidrop configurations, with waveforms plotted at points other than only the receiver of a point-point connection.

When, and where, have I EVER said that pin C is the ONLY source of SI problems?

BTW, thanks to Xilinx for putting those DT terminators back into the S3E parts.

L Although the original LVDS specification did not directly specify a max Cin value, newer specifications such as HyperTransport do; for example, HyperTransport requires a maximum 2pf (single-ended) Cin for receivers rated > 800 Mbps.

and also:

See for instance Table 13, footnote 1 of XAPP622, which clearly states that, although tested interoperable, the V2 devices do not meet the rise/fall requirements of the SFI-4 specification

pair.

I'm not sure exactly what you're disagreeing with here.

I was attempting to point out that real differential input buffers have a mix of both "shunt to plane" and "shunt across the pair" C, the values of each I'd like to see documented separately for modeling purposes.

Perhaps I should have said "effective Cdiff = 1/2 Cin" in my last sentence about the special case?

Brian

- A
- austin
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Mar 5, 2005 3:45 PM

Brian,

I think the best thing to do here is to agree to disagree on some of the points, and realize that we are in agreement on most of the rest.

C isn't the whole story.

We have more tests to do yet.

The story about packaging (HJ) was limited to inductance of the loops formed by signals, power, and ground.

dV/dt can be a problem.

Cin can be a problem.

Cin can be an insurmountable problem if the rate is high, the termination is external.

Standards written for ASSP's and ASIC's are often not met by FPGAs (to every dotted 'i', and crossed 't'). That does not mean that we do not get used in all of the standards. It just means we have to show how these small differences either can be dealt with, or don't limit the performance of the standards in question.

You have stated that you would use our parts if we had segregated the IO banks (and reduced the Cin), and you have stated that losing DCI would not be a factor in your use. Thank you for your marketing input. It turns out that DCI uses the existing output transistors, so its loss would reduce Cin by about 10% or less. It would speed up (lower the delay) through the IOB, however.

Virtex E did not have a true LVDS driver (used an external R network with single ended drivers).

As for the rest, I'll let it go for now, and come back later with data.

Austin

Brian Davis wrote: