Xilinx padding LC numbers, how do you feel about it?

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 23, 2006 10:18 PM

Good luck :)

These details DO matter, as they affect the credibility of the whole document. If they are allowed to creep one dimension because it "sounds better", what's next - marketing nanoseconds ?

-jg

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 23, 2006 10:45 PM

I agree, it's nice when we can agree. :)

Again we agree... almost. I think it is a small thing to fix the data sheets so they show the data rather than a marketing interpretation of the data. You seem to think it is a small thing for your customers to repeatedly calculate the correct number since they know the numbers in your data sheet are not accurate.

Yes, this *is* the issue, having to "investigate" rather than just read!

Wouldn't the best form be the accurate one without the marketing fudge factor?

I like Peter's attitude of being willing to work to change the expectations of your marketing rather than the expectations of your customers.

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 23, 2006 10:51 PM

Rick,

I have said (this will be the third time) that Peter and I agreed to 'try' to do this differently going forward.

Wish us luck.

Austin

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 23, 2006 11:38 PM

Austin,

I'm glad to see this attitude taking shape. Thrilled, actually, at the thought of your MarComm group presenting data in datasheets.

As far as repeating yourself, consider that different newsgroup readers (or settings) present the posts in different orders. The first (or second) response should be sufficient since the thread takes different branches; please don't take it personally that there are still a few posts that come in in the hours after you first post the good news.

Thanks for the effort on our behalf,

- John_H

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 23, 2006 11:52 PM

John,

Thanks.

I understand that posts come from many places, through many portals...

Aust>

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 12:01 AM

I fully agree - I was just looking for an incremental change, that those in marketing might not notice... :)

-jg

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 1:07 AM

Actually, you can read the entire data sheet for electrical specifications and not find any discussion about derating the device for toggle rate in terms of either power or heat. You can go on to read the users guide and find a discussion about derating for switching I/O that is very good and complete. The absence of similar data on LUT/FF/Routing/BRAM's is what is troubling, and the omission implies in some respects that it's not a problem. We do know better right ... so don't hide the data in the places newbie design engineers will read carefully.

yes ... but show us where in the data sheet which is supposed to have the specifications for the device where this derating is even hinted at? In the user manual?

Trail and error is not part of the design process we call engineering. Having solid data, properly disclosed where it can NOT BE AVOIDED by a typical engineer reading the data sheets and users manual is proper disclosure.

No, that is experimentation ... engineering is having the data up front.

Delivering reliable reconfigurable computing platforms where the end user is free to put any net list on the chip requires doing worst case designs for power and thermal. Worst case data is simply not available, and as far as I can tell, the device is non-function with worst case netlists, or soon will be from thermal runaway.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 9:37 AM

Does that mean we will see either total power or derating curves for LUT/FF toggles to define power and thermal limits as both a percentage of active logic or size of the active design?

While you may not see this as important from a perspective of past uses for traditional hardware design, as reconfigurable computing takes off there is a completely different mindset hitting your market. Software tools WILL be packing active algorithms into devices to get the best computational effciency per device. Good packing WILL yield close to

100% active logic in a device, and yeild exactly what you are mocking:

Devices that are by design not expected to be more than 10% active will greatly disappoint their customers. You will increasingly see large applications reduced to netlists which have a very high toggle percentage. The user will see that his application compiles to X number of LUTs, and will be buying devices with just over that number of LUTs to run the application. If the user does not encounter a clear warning to derate the device, then we have a truth in advertising problem.

Something akin to purchasing a family car advertised to cruise on the freeway at 75mph with an economy of 35mpg. Customers will not find the car acceptable, if it can only be operated at those speeds for 10 minutes, and require a 50 minute inactive period on the side of the road to cool down because some engineer decided to save a few horsepower by removing the water pump and the rest of the cooling system. Cars are assumed to have short term duty cycles in hours, not minutes.

While hardware design engineers may have grown used to the omissions in the data sheet about duty cycles for FPGAs, the new reconfigurable computing market is going to be much less tollerant of devices that can only be operated with a 10% duty cycle, or at 10% of rated capacity.

If the data sheet says 1M LUTs capable of 800Mhz, then that is what is expected by end users looking at the data sheets to purchase FPGA's for computing. If the device is really only capable of 10% of 1M LUTs at

800Mhz, then from a reconfigurable computer perspective it's really just that, a 100K LUT device, not a 1Mhz LUT device.

Likewise, if there is only one routing solution that will achieve 100% utilization of the device, and a typical reconfigurable computing netlist will be route limited at 65% utilizaton, then a 1M LUT device is NOT a 1M usable LUT device, but rather only a 650K LUT device for typical reconfigurable computing uses.

So, this is not the 1980's any more, or even the 1990's ... a new emerging primary market for FPGAs is reconfigurable computing and we need devices which are clearly specified for worst case loads that will be typical ... not the exception ... in that market.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 2:45 PM

I think you are a little off track here. As a practical matter, without a significant amount of handcrafting, you are not going to pack a device close to 100% AND achieve corner of the envelope operating speeds. It just isn't going to happen with a typical synthesized RTL design.

Further, unless the design is all bit-serial, you are going to have average toggle rates across the die less than 15%. Even with 100% bit serial designs, you'll very rarely see average toggle rates exceed even 50%.

Yes, you can drive a part hard, and in some cases you'll need a heatsink on it, but the synthesized RTL designs being turned out are not typically going to get you into that corner.

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 2:56 PM

Just so that I understand you (as someone watching from the peanut gallery) are you suggesting that there is not enough information to determine whether a heavily-loaded device will work? The maximum allowable junction temperature is in the data sheet, the heat transfer values are package-specicif so they're included in the package documentation. Since each I/O can handle such large currents, I'm assuming (I do nt know for certain) that the Vccint and Gnd pins can handle any current you push into the device. It *is* an engineering exercise to determine the power draw of any non-simple FPGA and provide an appropriate power and cooling environment or do you believe otherwise?

So... If someone can attach a heat-pipe cooling apparatus with 0.295 deg C/watt (recently seen for high ambient temperature CPU cooling) and a power supply capable of pushing in unlimited current, will the large devices just not work?

I love the idea of getting the information "clean" in the data sheet but I don't understand what information could be provided to an unsupervised newbie engineer biting off more of a design than he should be allowed to develop because he obviously doesn't have the experience to understand engineering tradeoff in programmable devices without clouding the datasheet with significant amounts of information of little interest to

98% of engineers who actually do their work with the information already at hand.

The arguement in this thread was that misleading numbers suck for engineers. Do you honestly believe that it's misleading not to hand-hold the engineer by applying 20 more pages of drivel that should be in app notes or analysis tools?

I'd really hate to see the attitude of communicating information to the data sheet's target audience effectively be hampered by the attitude that any engineer should be able to know everything about how to design FPGAs from the data sheet alone.

So, if you're not just venting because you blew up a board in the past because you didn't do a proper engineering job, WHAT information in a properly cooled, powered, analyzed FPGA design should NOT be in a decent application note but in every device data sheet? "Derating" doesn't float with me unless you have specific ideas on how this information should be effectively delivered in the technical document intended to communicate the details of the specific FPGA device family.

- John_H

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 5:02 PM

toys,

Our FPGAs do not thermally "run away."

Yes, you can melt solder, with enough power.

But, we also provide the tools (web/spreadsheet/xpower) to predict that the power might be excessive. That is engineering.

There is no way to 'derate' a design, until we know the entire design (or at least enough of it to run the web/spreadsheet power tools)

Sorry, there is just no other way to deal with the 200,000+ seats of software, and the 20,000+ designs happening at any given moment.

It is not an ASIC nor an ASSP that does one thing, one way.

It is a programmable device that can do just about anything from applications just below 1 watt, to applications just above 25 or 30 watts.

Why don't you try using the power prediction tool, and get a feel for what you can, and can not do?

How would you put that in a data sheet?

The recommended use of the tools is already in the users guides.

Austin

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 5:24 PM

I haven't tried hard yet, and easily got "close" with several large pipelined demos. I am concerned about what someone really concerned about performance may fairly easily do. And in particular what does that translate into regarding design envelopes for those that make PCI add in FPGA accelerator boards, and what should the customer clearly be aware of and have disclosed for selection guidelines in Reconfigurable Computing (RC) platforms. Issues like monitoring the on die temp diode is barely optional. The programmers for these platforms are not expected to be EE's with 10 years of designing FPGA systems under their belts. Having good, easy to read and understand limits for dynamic power, that the board can handle by design, is critical. Both for platform selection, and operation. The Virtex 4 users manual is very clear about derating I/Os for concurrent switching limits. We need similar data for the fpga core resources in equally well documented clarity so that these "programmers" have clear bounds.

:) for traditional state machine & data path designs sure, old accepted numbers everyone uses as a rule of thumb. For packed hand crafted RC application accelerators I think it's a lot worse than 15%, as my not so hand crafted demos have easily done better.

An RC5 pipelined cracking demo I did last year easily got the XCV2000E's on the Dini board HOT quickly. Even after adding heat sinks with forced air they were HOT. I was only using a little over half the device for the demo. Similar experience with XC2V6000's. That's not "some cases" from my view, and we have a very different view of "typical" for RC uses. I haven't gotten my hands on XC4VLX200's yet, but static analysis suggest similar problems.

And yes, I've since cooked a couple XCV800's, so I have the sense now to check FPGA's when testing a new application or demo. When RC cards do not even have the temp diode monitor IC connected to the FPGA, it's difficult for an RC programmer even to be able and check the chip temp when it's stuffed in a box, and automatically shut the app down or throttle back the clocks.

So that brings us back to "typically" and "worst case" for acceptable design standards for RC boards. Allowing the end user to easily toast $10K boards, and returning them to mfgr as warranty replairs, isn't good design practice. Nor does such poor reliability reflect well on the supplier chain for the devices, or the industry in general.

- G
- Georg Acher
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 5:37 PM

But you can measure the core supply current. Should be easy enough to do some correlation between current und temperature, at least for a user warning.

--
         Georg Acher, acher@in.tum.de
         http://www.lrr.in.tum.de/~acher
         "Oh no, not again !" The bowl of petunias

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 5:59 PM

For anyone using a PCI add-in FPGA, the FPGA still needs to be configured. If they're using Xilinx P&R tools, they're responsible for substantial engineering aspects of the design. The board manufacturer should have specific published design limits for the board. Armed with the junction-to-ambiant thermal resistance of the realized design and the engineering tools provided with the Xilinx toolset, proper engineering analysis can be done.

If the RC is provided through alternative tools, those tools should be able to deal with the design limits of the device; in this case the person reconfiguring the PCI add-in board certainly won't be able to draw useful information from the data sheet and may not know which data sheet values even apply to them. They're dealing with the board, not the part. The board data is what guides them.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 7:30 PM

Since there is no other P&R for current xilinx parts, this is a given, not a conditional.

I'm not aware of, or have used one, that does. The norm is to ship you a board with some vendors FPGA on it, provide some minimal documentation for the board, and a pointer to the fpga's data sheets and that's it ... other than that the sales dept may wish you good luck too. This is true to PCI RC boards, student boards, and proto boards from all sources.

You must be a far better engineer than most if armed with the data sheets, user manual, and this magical "Xilinx toolset" do a proper engineering analysis. I'm amazed, dumb founded.

Interesting how you might do it ... I've yet to figure out a way to determine peak ground path current for a specific device/package to get a clue about die observed vccint and ground ripple voltages. No where do I see a V4 spec for max ground currents, or even vccint currents for that matter, in order to make sure the die sees the spec'd min 1.14v between vccint and gnd on die after the drops on the carrier/package. The V4 users guide discussion is VERY general about max I/O switching per bank, nothing about peak currents, but does allows us to sum up the estimated ground currents for the I/O's. Peaks for I/O's are probably several times that. But I would love to know where you found the peak current for clb/ff/bram switching currents following a clock. I'd also like to see where you found the voltage drop between the balls and the die for this aggregate (IO + VccInt) current for both the vccint and gnd paths in terms of both resistance and the via inductance of the chip carrier from balls to die.

Last time I went into power estimator for xc4vlx200 and put in 200mhz

25% toggle for luts/ff's for 97% of the device, there was not a viable thermal solution where the die remained within spec at 125C. The extimated power with very modest I/O was well about 40W generating ground currents that would most likely peak currents well above 120A at clock edges. I suspect the part will not work even derated this much, much less at a clock rate that would be expect or a higher utilization than 25%. Admittedly that was a long time ago, maybe it's better today.

So, since you are cluefull about all this, what are the numbers? Is it really necessary to derate the device better than 50% in rated clock performance just to get these over thermal worst case numbers? Can the large FF packages handle a better than 100A peak current at clocks without leaving the device unstable? What is the die ripple at these values?

For that matter, none of the board vendors disclose what their gnd/pwr plane to ball resistance and inductance is either, so the end user is double stuck ... no detailed board data, no detailed fpga vendor data, to do a "proper engineering job" to figure out just how much the board needs to be derated and not violate power limits.

These are the easy questions :(

So, where did you find any RC vendor that managed to do their own P&R?

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 8:01 PM

toys,

A comment about modeling the FPGA and package:

Look up Howard Johnson's presentations. Ball/pad inductance is useless (last year's solution). I am surprised you bring it up. HJ proves that it is the loops and their topology that matter. And without a 3D field solver, you have no hope of learning anything at all.

Except if you use our FPGAs: we do the hard stuff so you don't have to (e.g ParseChevron (tm), patents pending).

The number of planes, and bumps/balls for Vcc's and ground ensure that the voltage drops are kept within the required limits, even for your 40 watt design. It is up to you to design the PDS for your board, and we have applications notes to help you in that regard.

Comments we have received include folks who now tell us that the SparseChevron packages(tm) are superior to IBM's "cross" design.

I'm happy to just be in the same league.

formatting link

Austin

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 8:02 PM

Oops,

SparseChevron(tm).

Aust> toys,

formatting link

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 8:03 PM

Numeric apps will not exceed about 15% by the nature of the number system, unless you are alternating positive and negative numbers on alternate clock cycles. It isn't so much a function of the design, it is a function of the properties of the number system.

Yes, I know you can make a part hot with an average design, been there done that many times. However, that was not what I was arguing. I was refuting your claim of near 100% utilization with near 100% toggle rates. I think if you evaluate the average toggle rates in your designs, you aren't going to find them to be much above 15% unless for some reason you are alternating the sign on alternate cycles. If that is the case, you can substantially reduce your power consumption by moving to a sign-magnitude number representation instead of 2's complement. It is pretty easy to come up with a design that will abuse the part by toggling every flip-flop and SRL16 at 100% with a clock near the upper limit. What does that prove? Well, about all it proves is that you can exceed the design limits of the system your FPGA is installed in if the FPGA has not been adequately cooled for that abuse.

As to getting the FPGA too hot for the installation, that is not the fault of the FPGA, rather it is a fault of the board design. If the board is designed for general purpose use, either the power supply should (virtually all FPGA boards now have on-board power supplies) should current limit at some value less than where the FPGA or board are physically damaged, or the design should make use of the temperature diodes. These are provided for exactly that purpose. BTW, limiting power supply output is a very effective way of limiting the power dissipation of the FPGAs. The current limits are going to depend on how well the FPGA is cooled, so it is going to vary application to application.

As for predicting the FPGA dissipation, that is difficult enough to do when the design is completely known, placed and routed, especially for designs the process data sourced off the FPGA, as the power dissipation is very much data pattern dependent. Xilinx provides tools for estimating power, which do about as good a job as they can given the constraints. For the spreadsheet, you the designer is responsible for estimating toggle rates, routing complexity, and part utilization. It is meant as a planning tool, although you can also back your completed design into it. Since it has no knowledge of the routing or your actual data, or for that matter the actual circuit, it is just an estimate. I find that it tends to be on the conservative side, especially with floorplanned designs, but as far as getting an actual number figure

+/-12dB. The XPower tool uses the actual netlist and simulation vectors, and will give you very accurate worst case power....for the test vectors given. If the actual use doesn't match the test vectors exactly, the power will be different. That makes even the awkward to use Xpower tool of only marginal usefulness in designs that process data from outside the FPGA, simply because you can't accurately model all the possible data sets (if you could, you could replace the FPGA with a ROM).

So my point is, the FPGA vendors give you the information they can about power dissipation. They can't know your design to provide you better numbers without you actually simulating the post PAR design with vectors that accurately reflect the actual usage. For a general purpose board, the board designer can limit the FPGA dissipation to whatever is safe for the board cooling environment by using thermal diodes or power supply current limits to avoid damage to the FPGA or board, and he can do that without knowing anything about your design. Isn't that a better tree to bark up?

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 8:24 PM

So, my point is, and nobody seems to disagree, that it's unrealistic to assume that the devices can be 100% packed, use the marketing numbers for system design clock speeds, at a modest toggle rate, and not blow right thru the power the device can handle. Disagree?

If not, then the device HAS TO BE DERATED from marketing numbers for RC use. Disagree?

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 8:40 PM

I've watched his presentation ... and it explained the horrible problems I had with Virtex and Virtex-II packages when pushing the power envelope.

The point of the 40w design, is that it's derated. Doubling, or better, the clock rate to get best possible device performance per P&R timings with a netlist that is busier than 25% and you quickly hit the power limits of even an active cooler. The data sheets imply clock rates that would easily produce designs well over a 100w in this package ... and that is where I start to seriously worry about the cross section of copper/lead in the package.

Worry, is because none of the data is specified to know where the limits are ... IE max currents for gnd and each power group, and what those current profiles look like in rise times.

That provokes the question of is derating necessary, if so how much, and can we easily get numbers to calculate that? not currently.

John