ISE 10.1 - Initial experience

- E
- emeb
  
  Contact options for registered users
posted
16 years ago

Sat, Mar 29, 2008 3:18 PM

I've got a fairly large design that I've been working with in ISE

9.2.04 for a while - it takes about 90% of a V2P100 and runs to completion in about 3.5 to 4 hours on my Linux x86_64 system (Athlon dual core 3800 w/ 4GB) using home-made make scripts. I decided to take 10.1 out for a spin to see if it really helps speed things up. Here's what I've seen on the first few runs:

- XST seems to run about as fast as it used to.

- NGDBUILD seems faster and seems to find errors in timing exceptions more quickly.

- MAP works about the same.

- PAR takes a lot longer to run. I'm seeing 8 hour runs that used to take 2 - 2.5 hours in 9.2.04 with the same constraints. It appears to be coming up with bad placements (Phase 12.27 seems to take _forever_) that are impossible to route successfully.

I'm in the process of adding more timing exceptions and this seems to help, but I still haven't had a successful PAR run. Let me re-iterate:

9.2.04 didn't have any trouble with this design using the exact same source and constraints.

Summary: 10.1 isn't working as well as 9.2.04. I'll probably be shelving it and waiting for the service pack.

Comments / Criticism / Suggestions welcome.

Eric

- F
- Frank Buss
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Mar 29, 2008 3:29 PM

Sounds like the usual software quality from Altera and Xilinx. The general advise: Don't use a new release until the first service pack is released. And for some releases even the versions with service packs are not recommended, e.g. Quartus 7.0 is fine and I've heard Quartus 7.2 is fine, but with Quartus 7.1 a design which worked with Quartus 7.0 didn't work any more.

--
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

- J
- job
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Mar 29, 2008 3:50 PM

Evaluate and send feedback to Xilinx Teeam.

But do not use new version before its first service pack is released!

- Laurent

formatting link

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 12:22 AM

Of course, if _everyone_ did this, they would not release a service pack, as

a) The beta testers (aka customers) have all gone

b) They would think the zero complaint feedback, meant there was nothing to fix

;)

-jg

- A
- austin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 1:07 AM

Jim,

I agree, but you do need to get feedback, somehow.

Yes, as a practical matter, when I was using FPGA devices to solve problems, and trying to make money doing so, I would NEVER use an initial release, but I would always freeze the development of a product at some previous release.

"New software" = ? (a potential for risk): be it Microsoft, Xilinx, or IBM.

However, the new release always has benefits, so I would have one engineer looking at the new release, so we would be "ready" when the time came to freeze for the next real product.

I do not think any engineering manager is doing much different today.

You need something stable, and known, to develop the product (in order to manage the risks).

You also need the new release to support the new families and features you need for the next product family.

And, the vendor needs to know what is working, and what is broken.

So, (to everyone), keep the feedback coming, (life goes on)

Austin

- A
- A.D.
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 6:27 AM

What about the IDE? From ISE 8.1 it became increasingly heavier and slower "per se", without adding any relevant new feature. While main tools (XST, P&R, etc.) improved a lot over time, the IDE simply get slower and slower, requiring a lot of resources just for opening the project naigator... I hope ISE 10.1 represent an inversion to this trend. (Ok, I know, you can use third party tools or just command line, but if this is the final result it would be better instead to maintain it as light and efficient as possible...)

Antonio

- A
- Antti
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 7:01 AM

Austin,

just saying it the nth time again - Xilinx should do REAL beta-testing BEFORE the releases! not doing it the microsoft way and and release untested versions and use ALL the client as beta-testers.

Antto

- A
- austin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 4:06 PM

Antti,

You (and everyone else) will probably not believe me, but there have been beta testers running this for more than 6 months.

Austin

- A
- Antti
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 4:23 PM

Austin,

sure I belive you! but they are not doing their job! How hard is that to understand?

If every new major release causes instant frustration and/or has TTFFF (time to first fatal failure) less than 30 minutes, then your beta- testers have continously failed todo their job. I cant see it any other way.

Antti

- A
- Alain
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Mar 30, 2008 5:41 PM

Hi all,

Here are my first impressions :

First, I will not repeat "what's news" but I've seen some real improvements, mainly concerning GUI (like report, for cross probing to HDL editor with warnings, very useful; and so on...). Concerning P&R result, at first I was quite frustrated : many timing constraints not passed successful in 10.1 and in 9.2 every was ok. In fact 10.1 analyze more constraints (in my design, constraints between clocks, that was in fact not relevant). After adding some "TIG" everything was ok. At present, I'm just worry about LUT and FF increase after map : More Lut and FF (??? I've to investigate...) but more LUT used as Shift registers and less slices occupied. Concerning EDK10.1, I would be interested to know your thoughts about the improvements or problems ...

Cheers !

Alain.

- Z
- Zara
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 5:44 AM

Well, my designs have all compiled OK. A complex design (83% of an XC3S400, with a microblaze and a whole bunch of proprietary peripherals included, with minor differences in pinout and contents over two different boards), has compiled consistently in 20% less time (25 minutes brought down to 20 minutes), just adding only 7 slices to the final design. PC, WXP SP2 ES, dual athlon)

The GCC compiler has some differences, as it will migrate some data from bss to text (which might be a problem for me, as the may be located on different RAM types, but I have not yet identified the differences). 4 different SW projects working on both different above mentioned boards working OK.

So, for me, everything seem to work fine up to this hour. Of course, the work is being done on a branch, the main development remains on

9.2.4(ISE)2(EDK).

Now, I'd really love receiving my license update to register my products (EDK and Chipscope), so I may begin working with it....

Best regards,

Zara

- M
- Morten Leikvoll
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 8:46 AM

I just wanted to mention..

We tried exact same build on two different computers here.. (both running ISE 9.2.04i)

A Dell precision 2Gb mem, 3Ghz xeon running XPx32
A HP xw9300 workstation, 4Gb mem, 2.40Ghz dual core AMD Opteron running XPx64

Results:

25min
1h25

We did not expected this difference. Maybe x64 is too slow? I suspect the timing is not only dependant on the ISE itself.. but on several unknown factors. I wish ISE could run some diag to help us find non-ideal conditions.

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 1:31 PM

That one is easy. Depending somewhat on the exact model, the Xeon is likely to have 4MB unified cache, whereas the Operon probably has 1MB cache per core. This means that for an compute intensive application that can only use one core the Xeon provides 4x the cache size.

The x64 application makes things even worse because it has a larger memory footprint.

Kolja Sulimma

- A
- Antti
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 2:29 PM

ok, here come my comments too, FIRST TRIAL with 10.1, 4 minutes after first installation

"Errorr:Portability:3, please open webcase" it tells me that it has run OUT memory, and that current memory useage is 312200 kb

this is brand new PC with 2GB RAM, its FRESH new OS install, no other applications running than ISE

so time to first fatal error has decreased from 20 (ISE 9.X) to 4 minutes.

Antti is waiting for 10.x service packs.

- A
- Antti
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 2:36 PM

after terminating the initial msg popups, another did come with "fatal in gui something" then ISE self-terminated. after opening the project again, doing clean and rerun, following comes:

FATAL_ERROR:Simulator:Fuse.cpp:164:$Id: Fuse.cpp,v 1.35 2007/11/07

21:25:47 sonals Exp $ - Failed to link the design Process will terminate. For technical support on this issue, please open a WebCase with this project attached at

formatting link

$Id: Fuse.cpp,v 1.35 2007/11/07 21:25:47 sonals Exp $ - Failed to link the design Process will terminate. For technical support on this issue, please open a WebCase with this project attached at

formatting link

well done Xilinx, I just installed it on new PC to checkout something quick... and all i get quick are fatal errors and self termination and requests to open webcases...

Antti

- A
- Antti
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 7:24 PM

ROTFL

no, the first fatal error and self termination was caused by the reasons explained in AR30373

however after using the supposed workaround "scenarion 3" it took only another 20 minutes to cause next fatal failures and forced termination of ISE

Antti

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 7:55 PM

Perhaps you triggered the special 'anti-Antti detector' that Xilinx put in all their software now ;)

-jg

- E
- emeb
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Mar 31, 2008 10:14 PM

You know, I hadn't noticed that, but I am seeing a fairly large 'inflation' of the design as well.

Here is the final utilization report from XST

9.2.04

------ Device utilization summary:

---------------------------

Selected Device : 2vp100ff1704-5

Number of Slices: 34416 out of 44096 78% Number of Slice Flip Flops: 48682 out of 88192 55% Number of 4 input LUTs: 55604 out of 88192 63% Number used as logic: 29643 Number used as Shift registers: 25229 Number used as RAMs: 732 Number of IOs: 61 Number of bonded IOBs: 61 out of 1040 5% IOB Flip Flops: 36 Number of BRAMs: 39 out of 444 8% Number of MULT18X18s: 324 out of 444 72% Number of GCLKs: 1 out of 16 6% Number of DCMs: 1 out of 12 8%

10.1.0

------ Device utilization summary:

---------------------------

Selected Device : 2vp100ff1704-5

Number of Slices: 34881 out of 44096 79% Number of Slice Flip Flops: 53619 out of 88192 60% Number of 4 input LUTs: 54316 out of 88192 61% Number used as logic: 29631 Number used as Shift registers: 23953 Number used as RAMs: 732 Number of IOs: 61 Number of bonded IOBs: 61 out of 1040 5% IOB Flip Flops: 36 Number of BRAMs: 39 out of 444 8% Number of MULT18X18s: 324 out of 444 72% Number of GCLKs: 1 out of 16 6% Number of DCMs: 1 out of 12 8%

Not much difference - 10.1 is a little larger, but not much.

Here are the usage summaries from the top of the PAR file

9.2.04

------ Number of MULT18X18s 324 out of 444 72% Number of RAMB16s 44 out of 444 9% Number of SLICEs 37241 out of 44096 84%

10.1.0

------ Number of MULT18X18s 324 out of 444 72% Number of RAMB16s 44 out of 444 9% Number of SLICEs 40732 out of 44096 92%

Wow - looks like MAP really flubbed it - almost 8% growth from 9.2.04 to 10.1.0.

And here are the final lines of the place & route status:

9.2.04

------ Phase 6: 45317 unrouted; (0) REAL time: 1 hrs 19 mins 41 secs

Intermediate status: 37260 unrouted; REAL time: 1 hrs 52 mins

45 secs

Phase 7: 0 unrouted; (0) REAL time: 2 hrs 1 mins 33 secs

Phase 8: 0 unrouted; (0) REAL time: 2 hrs 3 mins 42 secs

10.1.0

------ Phase 6: 57617 unrouted; (693) REAL time: 6 hrs 16 mins 47 secs

Intermediate status: 45825 unrouted; REAL time: 6 hrs 55 mins

31 secs

Phase 7: 0 unrouted; (1443123) REAL time: 7 hrs 16 mins 57 secs

Phase 8: 0 unrouted; (1443123) REAL time: 7 hrs 18 mins 53 secs

Phase 9: 0 unrouted; (1433735) REAL time: 7 hrs 19 mins 10 secs

PAR was _not_ happy. Took amost 4x longer to run. This is with the exact same source & control files.

A little something for the developers to chew on.

Eric

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Apr 1, 2008 7:44 AM

To the contrary. Look again at the same report:

9.2.4 Number of 4 input LUTs: 55604 out of 88192 63%

10.1.0 Number of 4 input LUTs: 54316 out of 88192 61%

Actually there is a 2.3% reduction in area. The number of slices is meaningless, it only tells you the LUTs are distributed. You might as well say you had 100% utilization because all four quadrants of the chips are used.

Any Slice with less than the maximum number of LUTs used still has space for more logic that can and will be used by the tools.

Always report LUT and DFF numbers and ignore the Slices.

Kolja Sulimma

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Apr 1, 2008 9:50 AM

On that subject, any idea why the FlipFlop count went UP 5%, when the LUT count went down 2%

- and why the tools decided to do that with the same param settings ?

A change that large is not good to see on a 'signed off' design....

9.2.04

------ Number of Slice Flip Flops: 48682 out of 88192 55% Number of 4 input LUTs: 55604 out of 88192 63%

10.1.0

------ Number of Slice Flip Flops: 53619 out of 88192 60% Number of 4 input LUTs: 54316 out of 88192 61%

-jg