fastest FPGA

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 4:25 AM

It is good that you are open to possibilities.

Another odd thing about power that I learned was that the placement of the decoupling caps is not at all critical. This guy had designed a test board with caps at less than an inch from the test points and at about 3, 6 and 9 inches. The graphs of impedance between the planes are nearly identical regardless of which capacitor was populated. So it is clear that the caps don't need to be right on top of the power pins. But to make the smaller caps work well, you do need more of them than you need of the larger value caps. Not that I am doubting that you had enough. I am just pointing out what the data indicates is required.

I have no doubt that there are some designs that will not do well in an FPGA. These are complex chips and it has got to be hard to design them for all applications. It is also possible that there are SI issues inside the chip. As you point out, ground and power bounce are very real problems and there is nothing you can do on the outside to mitigate them.

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 6:33 AM

I love eye-opening engineering. It's not often that there's a spark of "oh yeaaaaah" like this.

The SRF of a capacitor gives you the minimum impedance - as close to the ideal dead short as you can get. Above that frequency, the cap goes inductive but it's still a darn good impedance for several octaves of frequency. If you have another capacitor with a higher SRF, what you end up with isn't the *series* resonance, but an anti-parallel resonance (the "anti" looks strange but those were the terms used). Going back to the early book-learning, this anti-parallel resonance ends up as an open, not a direct short. The only thing saving that frequency point from being a dead-open is the non-ideal characteristics of those parasitics. SRF in a capacitor is a good thing. Resonance between a an inductor and a capacitor in parallel isn't so good.

If you model the parallel L and C you get an open. If you sprinkle some R into those textbook equations, the impedance doesn't keep heading up to la-la land. Really neat stuff. I haven't had a chance to try this out myself, but I sure would like to.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 6:34 AM

For some good plotted-examples, of what you describe, see this pdf

formatting link

You'd need to be carefull in the description. In this case, the caps decouple the plane, and the plane decouples the device. So short connections to the plane are VERY important, and once you have those, then yes, the absolute CAP placement is less critical.

- see fig 15 in the pdf above, for the plane vs trace nh/cm

I think some PCB stackups, offer thinner plane separations, and higher permittivity, both to drive down the plane impedance ( for some cost trade off ).

-jg

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 9:14 AM

Jim Granville schrieb:

formatting link

?articleID=192300291

Hmm. 128*100M*4 = 100G Somehow I am missing a factor of two here.

Kolja Sulimma

- J
- JustJohn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 9:32 AM

Now I'm totally confused TC. What does "highly random" data mean? Data with a 75% toggle rate is halfway towards the ultimate pathological case of "0101010101...". To me, highly random data implies a fairly flat distribution with a 50% average toggle rate, period. If any given flop has a 50% chance of toggling on any given clock cycle, your summation seems to imply that some flops (or srl cells) are toggling twice per cycle. That's a neat trick, and I don't get it.

Is crypto/FP data that different than other data? Perhaps there's an opportunity here. A clever recoding could replace highly toggling symbols by equivalent low toggle rate sequences. You could change your

75% to 25%, and get a 3:1 power reduction (um, tongue in cheek there).

One parting comment, John, you're a fairly bright guy (I truly admire anyone with skills and patience to put together a language compiler), but everyone errs from time to time. In my callow youth, I designed a path in an XC3000 that relied on route/gate delay to take a signal across a clock boundary, to catch it on the next clock (In retrospect, 3000s were really tiny!). Come the die shrink, I got burned. Sounds like you did too, even more literally. I sympathize, but we live and learn. The caustic expletives don't help your cause, and really puts folks off here.

Regards all, Just John

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 4:40 PM

I don't understand this. LUT SRL's are not shift registers if I understand them correctly. They don't move the data, they move the pointer. So the only power used is in the internal address counter and the writing of the data to one cell in the LUT. Is that really so much power? If you use the LUTs as RAM and make your own counter, I expect the power per shift register is actually higher at that point.

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 5:03 PM

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 5:09 PM

I wasn't aware of that. Is this only true for the V5 parts, or has this always been true for SRLs in Xilinx parts that have them?

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 5:51 PM

yep ... my error ... was tired and starting thinking in terms of run probability which is clearly wrong.

I should have simply stood on my real concern, which is for bit serial machines, they will have worst case data from time to time, or even more likely near worst case data, and have to eat the power for that state for a word length of clocks. Which means, the power system needs to be able to deliever worst case power for a word latency, or get a power rail "THUMP" and reset, or worse, corrupted data.

you are right ... and at the same time, my tollerance for Austin and Peter is near zero, when they assert things on reputation and their employers name, that are direct personal attacks. I'll be nice and mind my language better.

My respect level for Xilinx is at a near all time low.

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 6:48 PM

SRLs have been actual shift registers for quite some time. A recent discussion pointed out that the Pre-V5 SRLs shifted charge from one memory cell to the next (I'm assuming like a CCD) but this "trick" didn't scale well beyond 90 nm so the V5 parts use master/slave latches to halve the number of available elements in the SRLs.

SRLs physically have a Din and a MUX out with a Dout optionally available to extend the SRL length.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 7:45 PM

I thought the reason that SRLs were added was that they took little extra resources. This sounds like it requires significantly more logic in the LUT. But I guess the utility of the feature has allowed it to be practical in spite of the increased complexity.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 7:46 PM

A better analogy, is that you and Austin have been having great fun kicking the homeless, and are pissed because you got busted for assault by an undercover cop.

It's never fair to shoot down someones statements with a totally lost, clueless assertion unless you also make the effort to defend that statement with a sound reasoned argment.

Trashing someone, on your reputation alone, or Xilinx's reputation alone is simply calling them clueless because you and Xilinx say they are, to hide the failings in your product line.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 7:59 PM

I know what you mean. There have been times when I could not take the "attitude" that shows in some of the posts by company representatives here. It is a waste of time to get into a protracted arguement with anyone at any time on the Internet and comp.arch.fpga is no exception. Those people are not likely to change and it can make you look bad. But that does not mean you are wrong in your discussion, just that it's not really worth the effort.

I will say that my attitude towards Xilinx has shifted as well. I used to use Xilinx exclusively unless there was a specific reason to use another brand. Now I consider them all to be about the same unless there is a specific reason to do otherwise. In particular I find that the Xilinx parts allways have something that makes them a PITA to use. The early Virtex parts and Spartan II had the huge power on surge problem which was presented an unavoidable for "modern" chips. This was fixed in later generations. Then the early Spartan III parts had an unusual sensitivity to voltage excursions which were presented as unavoidable for "modern" chips. This was fixed in later versions of the same chips. The last few generations have used 2.5 volt interfaces for configuration while the next version I have heard will go back to

3.3 volt interfaces. I have not looked at the more recent parts to see what is up with them, but I am sure whatever it is, it is "unavoidable".

As to the issues of discussing this sort of problem here, "It's just Chinatown", or in this case "It's just the Internet!"

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 9:39 PM

Could this "trick" have been part of the observed problem ?

-jg

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 10:14 PM

Why would you suspect it might? The SRLs have been a joy to work with over the years. They've worked flawlessly.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 10:45 PM

In this discussion about FPGA failues, when pushed past the edge, a couple of things have made me ponder. The dominant Icc spike source in a FPGA is the clock tree, and data spikes will of course add, but I was a little surprised that data patterns seemed able to 'break' a design.

So I was wondering if the Icc spikes on the SRL-shift is more extreme, and/or, if the nature of the chaining trick, made it more susceptable to the inductive kick effects.

Given that newer devices _are_ claimed to be better able to cope, it is apparent some engineering changes have been made ?

-jg

- E
- Eric Smith
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 9:37 PM

Understandable. Remember though that there are many of us here that really appreciate your contributions.

Have a good trip! Will you get some vacationing in before or after the conference? I haven't been to Madrid, but I'd love to go.

Best regards, Eric

- T
- Tim
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 31, 2006 12:26 PM

Peter Alfke wrote

Safe journey. But before you go, is your HotChips talk about V5 up on the web?

Tim

- A
- alterauser
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 31, 2006 5:25 PM

and

I have some experience with overclocking RAM and PC-CPUs and water cooling as well. First, overclocking hardly causes more than 20% in speed, and oft requires additional voltage in order to keep the signal integrity (rising problem). 10% more voltage + 20% more speed will cause more heat, whereby heat lowers the possible speed anyway. To run a device under these operating conditions, undercooling will have to be performed. One at least wil have to apply a water cooling system but even with the lowest temperatures, signals will fail from a certain freq on. Regarding cost, two FPGAs is the choice. :-)

WC thus is only a choice, if a (group) FPGA runs under perfect timing but only has temperature problems because of a hot environment: I did not do that with FPGAs already, but with my first dual-PC-System , I was able to run two old 1200-Athlons (which used to become very hot) @1800 having only around 40degrees surface temp with both of the Athlons. They were producing more than 60Watts each.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Sep 5, 2006 2:32 PM

Rickman,

The SRLs do move the data, and always have. The shift register in pre-V5, as I understand it is latches rather than master-slave FFs. Maybe that is what caused the confusion. The SRL16's do move the data though.

Think of the structure as a 16 bit shift register with a 16:1 mux on the outputs. The LUT address goes to the 16:1 mux select lines. The shift register is not shifted when used as a LUT, but is shifted when used as an SRL16.