Utilisation of Xilinx FPGAs

Dear All,

I hope that you can help.

I am looking at trying to fit a number of IP cores into a single Xilinx FPGA, I have heard that it is not possible to completly utilise all the FPGA resources (RAM, Logic Cells, DCMs etc ... ) because of routing problems.

Do anyone know of a rough percentage of the the FPGA resources that can be expected to be utilised before issues arise in trying to route the design?

I have head that it is as low as 60% but am hoping that this is not the case.

I understand that it really does depend on what you put into the FPGA but to only be able to utilise 60% on average seams a little poor to me.

Kind Regards

Simon

Reply to
stockton
Loading thread data ...

Big things like BRAMs and DLLs easily do 100% utilization. The critical thing will be LUT utilization which, in my experience (xilinx) goes up to 60% until you use the "disable register ordering" switch then you can get over 90% at a slight speed hit.

-Jeff

Reply to
Jeff Cunningham

Let's start with the dedicated resources, the BlockRAMs, multipliers, DCMs. Global clocks, PPCs, MGTs and the general-purpose I/O. You know before you start your desigh what percentage of these resources you need, and I bet it seldom will be 100% of each of them. So here you get everything you need (or you already know you need a different, bigger chip.) That leaves us with the balancing between logic resources (LUTs, flip-flops, SRL16s), routing capabilities, and your specific needs. Routing used to be the limiting factor in the distant past, but modern FPGAs have such an abundance of routing resources that this is seldom the limitation. Performance may suffer when the chip gets really crowded, but that depends very much on your requirements.

60% sounds to me like an insanely conservative number. Butt with over 100,000 new designs started every year, there is an enormous spread, and one should never say "never". Let's settle on "extremely unlikely". With the increasing complexity available in modern FPGAs, designers also have developed a more mature attitude, realizing that some spare room is a good thing to have. 15 years ago, some cried when a few LUTs went unused: "Such a waste !". Now they say: "Nice to have some room for future changes."

Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

Depends. For example, on a Xilinx Spartan-3, if you use the full width of the large memories known as block rams (x36) you can't use the multipliers in the same tile. Block rams can be configured from

16k x1, 8k x2, 4k x4, 2K x9, 1k x18, and 512 x36. If the IP you buy has nothing but full width block rams (configured as 512 x36) and uses multipliers, you might not even get past 50% on the large resources. Yet anther question to ask the IP vendors.

On the other hand, I've used 100% of the block rams on several different designs.

The key question is: What is the limiting resource? Pins? Memory? Multipliers? Lookup tables? Time? Money? If you need X pins and that requires a given size part, you may have no use for the extra logic and memory that comes with that part. If you need Y bytes of internal RAM and have no use for multipliers, you may well use 0% of the multipliers. If you need a design done in a short design time (and your volumes are not huge), using a smaller percentage of the part can save time making things fit. If you hope to make and sell millions of items, spending a lot of time to cram the design into a smaller part can make sense. It usually wouldn't make sense if the world wide lifetime demand is exactly eight units.

-- Phil Hays Phil-hays at posting domain (- .net + .com) should work for email

Reply to
Phil Hays

route

critical

up

can

I agree with Jeff that with modern parts (V2Pro and newer) and latest tools, many designs should be able to achieve 90% LUT utilization. It does depend somewhat on the design (clock rate, number of clock domains, number of levels of logic, and how efficiently the logic gets placed) though, so if logic estimates show much over 80%, I'd make sure to run it all the way through the tools for a final confirmation.

Lastly, my experience has been that that FF count is a poor indicator of projected device utilization. LUT utilization is nearly always higher, and often, considerably higher (I've seen ~25% in a few cases). Have fun,

Marc

Reply to
Marc Randolph

We test all of our IP Cores on a few Xilinx Development boards we have in house and are stuck with device sizes.

We always have a small SoC that we attach our "Core to be tested" to. I have seen utilization of 98%++ specially when we make heavy use of ChipScope. As long as you stick with EDK based SoC and their 100MHz bus speed limitation, you are usually ok up 98% utilization.

Best Regards, rudi ============================================================= Rudolf Usselmann, ASICS World Services,

formatting link
Your Partner for IP Cores, Design, Verification and Synthesis

Reply to
Rudolf Usselmann

"Peter Alfke" schrieb im Newsbeitrag news: snipped-for-privacy@c13g2000cwb.googlegroups.com...

Yes, but it would be nice to leave some (more) control to the user. Some time ago, I started a thread about the s****d XST or mapper, who adds insanely much route thru LUTs in a design (1200 LUTs for logic, +350 route thru), that runs at a slow 36 MHz clock. The case could't be solved also by the help of our Xilinx FAEs. I konw, synthesis and mapping are complex processes, and we have to accept a certain amount of unexplained "black magic", but sometimes its just too much.

Regards Falk

Reply to
Falk Brunner

Our experience has been about 70-80%. Interestingly, on large parts you can have trouble if you aren't using enough resources because the tools 'spread' the logic across the chip too far and cause timing errors. So... ideally? I would say 60-80% utilization.

Reply to
Bo

If you do not constrain the timing, the tools "are lazy" and make their life easy, by spreading the logic around, so as to avoid congestion. The proper solution is to constrain the timing and thus force the tools to make more appropriate, more intelligent, and more demanding decisions.

It's the tools' equivalent to Parkinson's Law: "Every job grows to use up all the available resources in time, space, and money". It's up to you to delineate the available resources ! Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

'spread'

their

tools

Howdy Peter,

I restored part of Bo's message above... he mentions timing errors, which I assume means he did have timing constraints. In the past, I have also seen MAP or PAR do what he describes - rather than spreading the logic across the chip, it groups it towards the edges, leaving a larger unused area in the middle. Adding a few pipeline stages corrected the timing problems.

But in the grand scheme of things, I think I prefer this behavior because it implies to me that MAP or PAR attach a pretty high importance to putting the logic close to the IOBs, which turns it into an easy way to get (very basic) floorplanning if IOB locations are chosen wisely.

space,

No doubt about either one of those statements!

Have fun,

Marc

Reply to
Marc Randolph

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.