99% Utilisation !

Should we ever get to that ? I know typically A and X bother recommend 80-85% resource usage and so do a lot of others But besides having no provision for expansion of design and probably extremely long p&r times, what are the other dangers of such a high resource utilisation, if our clock is only 40 MHz. Also what if we are using all 8 Rocket IOs in a device ?

Reply to
Adarsh Kumar Jain
Loading thread data ...

a

resource

You're worried because you have 99% slice utilization? Don't! Check your LUT and register usage and you'll find you're probably *well* under the 99% mark. The P&R software tends to spread things around in the fabric, one element per slice until the slices are each occupied with something, then begin to backfill the extra slice resources to get the design in the part. It seems inefficient, but it's what we have to deal with.

I look forward to the day when the slice components are be freely rearranged by the P&R software; why have two registers locked together at the map phase when P&R needs to make the tough decisions?

Reply to
John_H

Most of the area of an FPGA is routing. Maybe you should try to maximize the useage of that. Sounds like you could gain more efficiency there than in the LUT usage.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

Except that because of marketing and perception reasons (tehre's no good way to say "routing utilization" that users currently understand), the routing is significantly overprovisioned for most designs to allow high chip utilization.

Running at high utilization is a LOT easier if a large amount of the logic is floorplanned/placed, it makes both placement easier and routing easier.

--
Nicholas C. Weaver.  to reply email to "nweaver" at the domain
icsi.berkeley.edu
Reply to
Nicholas Weaver

Reference Xcell journal Issue 50 Fall 2004 Introduced in September 2003, ISE 6 adds a new timing driven map option that helps get better design utilization for your FPGA devices, particularly if the device is already 90% utilized. Timing driven map is a next generation enhancement to ISE physical synthesis placement with logic slice packing for Virtex-II, Virtex-II Pro and Spartan-3 devices to improve placement quality for unrelated logic.

I think I tried using this with ISE 6.2i with Spartan-3. Looked to me like it was a keeper.

-Newman

Reply to
newman5382

Spot on, Nicholas. One further point; often the time you spend on Floorplanning is more than recovered in P&R times, certainly for repeatedly used RPMs. Cheers, Syms.

Reply to
Symon

And don't forget the performance win. I have a deliberately dinky 3 pipeline stage encryption core: placing just PART of the core allows the 125 MHz timing to be met easily, with a vast fraction of the tool time.

RLOC is your friend.

--
Nicholas C. Weaver.  to reply email to "nweaver" at the domain
icsi.berkeley.edu
Reply to
Nicholas Weaver

All valid points. 99% utilization is probably the slice usage, not the LUT usage, so you may still have plenty of margin (you are really between 50% and

99% LUT utilizati> > >> Running at high utilization is a LOT easier if a large amount of the

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

All valid points. 99% utilization is probably the slice usage, not the LUT usage, so you may still have plenty of margin (you are really between 50% and

99% LUT utilization, check the map results for LUT and memory usage. At 40 MHz, you will probably not have a problem if you were smart about your design (no logic with layer upon layer of LUTs). As the device gets filled, you may find that routing starts becoming an issue. That can be mostly alleviated by doing some floorplanning. Many of our designs have LUT utilization above 80%. The PAR times can get long, especially with less than optimal placement. At 40 MHz, you may also be able to take advantage of a multiplied clock to reduce logic should you get painted into a corner. I think you've got plenty of wiggle room to get out of a tight spot if you get there in future design spins. You may have to be smart about the design to fit a larger change in.

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

There is another reason to consider hand placement even if you don't need the perf. When you build up a floorplan its as likely to go slower as faster so it it has to be done incrementally keeping only the better placement decisions. As the plan fills out you get a much better feel for what area different logic funcs take up. Its all very time consuming though! Worth doing for datapaths, but only for control logic if timing really forces it.

For instance dualport LUT rams, srl16s, mux4s usually take 2 LUT sites and have to be paired with a related FF and leave 1 FF site unused. Apart from those, its almost possible to use up 99% of the FFs in a datapath as long as say the reg width are even and related registers are controlled by same signals. With that in mind, it then becomes possible to adjust the logic design so that more datapath logic will fall nicely into the unused LUT columns where there might be a row of plain FFs.

This brings up 1 little gripe with XST mapper. When a ck en has large fanout and drives many different regs of different widths, the FF driving the enables will be split into clones (good part) but often the branches will enable groups of FFs that is less optimal and cuts across a slice pair.

In my cpu project, with some 20 regular 16b regs on 1 enable I get told to remove 1 FF from the middle of a few of these regs because of this odd splitting which is tiresome. Its too early to manually split such enables.

Are there any switches to force grouping of replicated FF signals to stay within pairs? Timing driven placement seemed to help, as well as not placing the ck enable FFs.

My other gripe about floorplanning is the LUT structures/names are liable to change on me even if the logic that created it doesn't so I try not to place those since they tend to get placed/pulled near the connected FFs that I did place. Still lots to learn:-)

regards johnjakson_usa_com

Reply to
john jakson

untill and unless you are able to meet your design constraints (speed, power etc) with acceptable software/manual efforts, you should really not worry about the device utilization. I always try to fit the design in the smallest possible device. Sometimes i am able to use more than 90% LUT and sometimes quite lesser. everything depends on your design constraints

Reply to
digari

Hmmm. Every time I've used this it's always slowed down my design. Cheers, JonB

Reply to
Jon Beniston

JonB,

It definitely takes map longer to run. I was using XST 6_2i sp3 with a goal of 100 MHz (Spartan 3 - 1500). I initially kept all the hierarchy, but when I went over 90%, I started to get nervous and had XST flatten the design. The utilization went down by 5% to 7%, but the timing deteriorated. After several recode attempts to speed up the critical path, only to have another unrelated path show a slow down, I went to timing driven map and used selective keep hierarchy attributes in some of the modules "perhaps a lazy man's floor plan". Utilization stayed about the same, and internal timing was met with ease. Perhaps I attributed too much of timing improvement to the timing driven map, and not enough to the selective keep hierarchy.

-Newman

Reply to
newman5382

John,

There is something you can do about it: use a local copy of the control signal and put a keep buffer on it. The snippet below is a simple VHDL example:

signal lcl_ce std_logic; attribute syn_keep boolean; attribute syn_keep of lcl_ce:signal is true;

begin lcl_ce When a ck en has large fanout and drives many different regs of

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

I've done this some time ago with a XC2S30 having 2 LUTs left and successfully p&r and then targettting onto some real hardware. I also have a test build that we use on our Broaddown2 XC3S400 that uses every I/O and

99%+ of registers with no problems.

John Adair Enterpoint Ltd. - Home of Broaddown2. The Ultimate Spartan3 Development Board.

formatting link

a

resource

Reply to
John Adair

snipping

Thanks Ray, I will Verilog those suggestions into the layout, I'm sure that will help.

regards johnjakson_usa_com

Reply to
john jakson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.