I've got a fairly large design that I've been working with in ISE
9.2.04 for a while - it takes about 90% of a V2P100 and runs to completion in about 3.5 to 4 hours on my Linux x86_64 system (Athlon dual core 3800 w/ 4GB) using home-made make scripts. I decided to take
10.1 out for a spin to see if it really helps speed things up. Here's what I've seen on the first few runs:
- XST seems to run about as fast as it used to.
- NGDBUILD seems faster and seems to find errors in timing exceptions more quickly.
- MAP works about the same.
- PAR takes a lot longer to run. I'm seeing 8 hour runs that used to take 2 - 2.5 hours in 9.2.04 with the same constraints. It appears to be coming up with bad placements (Phase 12.27 seems to take _forever_) that are impossible to route successfully.
I'm in the process of adding more timing exceptions and this seems to help, but I still haven't had a successful PAR run. Let me re-iterate:
9.2.04 didn't have any trouble with this design using the exact same source and constraints.
Summary: 10.1 isn't working as well as 9.2.04. I'll probably be shelving it and waiting for the service pack.
Sounds like the usual software quality from Altera and Xilinx. The general advise: Don't use a new release until the first service pack is released. And for some releases even the versions with service packs are not recommended, e.g. Quartus 7.0 is fine and I've heard Quartus 7.2 is fine, but with Quartus 7.1 a design which worked with Quartus 7.0 didn't work any more.
Frank Buss, firstname.lastname@example.org
I agree, but you do need to get feedback, somehow.
Yes, as a practical matter, when I was using FPGA devices to solve problems, and trying to make money doing so, I would NEVER use an initial release, but I would always freeze the development of a product at some previous release.
"New software" = ? (a potential for risk): be it Microsoft, Xilinx, or IBM.
However, the new release always has benefits, so I would have one engineer looking at the new release, so we would be "ready" when the time came to freeze for the next real product.
I do not think any engineering manager is doing much different today.
You need something stable, and known, to develop the product (in order to manage the risks).
You also need the new release to support the new families and features you need for the next product family.
And, the vendor needs to know what is working, and what is broken.
So, (to everyone), keep the feedback coming, (life goes on)
What about the IDE? From ISE 8.1 it became increasingly heavier and slower "per se", without adding any relevant new feature. While main tools (XST, P&R, etc.) improved a lot over time, the IDE simply get slower and slower, requiring a lot of resources just for opening the project naigator... I hope ISE 10.1 represent an inversion to this trend. (Ok, I know, you can use third party tools or just command line, but if this is the final result it would be better instead to maintain it as light and efficient as possible...)
sure I belive you! but they are not doing their job! How hard is that to understand?
If every new major release causes instant frustration and/or has TTFFF (time to first fatal failure) less than 30 minutes, then your beta- testers have continously failed todo their job. I cant see it any other way.
First, I will not repeat "what's news" but I've seen some real improvements, mainly concerning GUI (like report, for cross probing to HDL editor with warnings, very useful; and so on...). Concerning P&R result, at first I was quite frustrated : many timing constraints not passed successful in 10.1 and in 9.2 every was ok. In fact 10.1 analyze more constraints (in my design, constraints between clocks, that was in fact not relevant). After adding some "TIG" everything was ok. At present, I'm just worry about LUT and FF increase after map : More Lut and FF (??? I've to investigate...) but more LUT used as Shift registers and less slices occupied. Concerning EDK10.1, I would be interested to know your thoughts about the improvements or problems ...
Well, my designs have all compiled OK. A complex design (83% of an XC3S400, with a microblaze and a whole bunch of proprietary peripherals included, with minor differences in pinout and contents over two different boards), has compiled consistently in 20% less time (25 minutes brought down to 20 minutes), just adding only 7 slices to the final design. PC, WXP SP2 ES, dual athlon)
The GCC compiler has some differences, as it will migrate some data from bss to text (which might be a problem for me, as the may be located on different RAM types, but I have not yet identified the differences). 4 different SW projects working on both different above mentioned boards working OK.
So, for me, everything seem to work fine up to this hour. Of course, the work is being done on a branch, the main development remains on
Now, I'd really love receiving my license update to register my products (EDK and Chipscope), so I may begin working with it....
We tried exact same build on two different computers here.. (both running ISE 9.2.04i)
A Dell precision 2Gb mem, 3Ghz xeon running XPx32
A HP xw9300 workstation, 4Gb mem, 2.40Ghz dual core AMD Opteron running XPx64
We did not expected this difference. Maybe x64 is too slow? I suspect the timing is not only dependant on the ISE itself.. but on several unknown factors. I wish ISE could run some diag to help us find non-ideal conditions.
That one is easy. Depending somewhat on the exact model, the Xeon is likely to have 4MB unified cache, whereas the Operon probably has 1MB cache per core. This means that for an compute intensive application that can only use one core the Xeon provides 4x the cache size.
The x64 application makes things even worse because it has a larger memory footprint.
You know, I hadn't noticed that, but I am seeing a fairly large 'inflation' of the design as well.
Here is the final utilization report from XST
------ Device utilization summary:
Selected Device : 2vp100ff1704-5
Number of Slices: 34416 out of 44096 78% Number of Slice Flip Flops: 48682 out of 88192 55% Number of 4 input LUTs: 55604 out of 88192 63% Number used as logic: 29643 Number used as Shift registers: 25229 Number used as RAMs: 732 Number of IOs: 61 Number of bonded IOBs: 61 out of 1040 5% IOB Flip Flops: 36 Number of BRAMs: 39 out of 444 8% Number of MULT18X18s: 324 out of 444 72% Number of GCLKs: 1 out of 16 6% Number of DCMs: 1 out of 12 8%
------ Device utilization summary:
Selected Device : 2vp100ff1704-5
Number of Slices: 34881 out of 44096 79% Number of Slice Flip Flops: 53619 out of 88192 60% Number of 4 input LUTs: 54316 out of 88192 61% Number used as logic: 29631 Number used as Shift registers: 23953 Number used as RAMs: 732 Number of IOs: 61 Number of bonded IOBs: 61 out of 1040 5% IOB Flip Flops: 36 Number of BRAMs: 39 out of 444 8% Number of MULT18X18s: 324 out of 444 72% Number of GCLKs: 1 out of 16 6% Number of DCMs: 1 out of 12 8%
Not much difference - 10.1 is a little larger, but not much.
Here are the usage summaries from the top of the PAR file
------ Number of MULT18X18s 324 out of 444 72% Number of RAMB16s 44 out of 444 9% Number of SLICEs 37241 out of 44096 84%
------ Number of MULT18X18s 324 out of 444 72% Number of RAMB16s 44 out of 444 9% Number of SLICEs 40732 out of 44096 92%
Wow - looks like MAP really flubbed it - almost 8% growth from 9.2.04 to 10.1.0.
And here are the final lines of the place & route status:
------ Phase 6: 45317 unrouted; (0) REAL time: 1 hrs 19 mins 41 secs
Intermediate status: 37260 unrouted; REAL time: 1 hrs 52 mins
Phase 7: 0 unrouted; (0) REAL time: 2 hrs 1 mins 33 secs
Phase 8: 0 unrouted; (0) REAL time: 2 hrs 3 mins 42 secs
------ Phase 6: 57617 unrouted; (693) REAL time: 6 hrs 16 mins 47 secs
Intermediate status: 45825 unrouted; REAL time: 6 hrs 55 mins
Phase 7: 0 unrouted; (1443123) REAL time: 7 hrs 16 mins 57 secs
Phase 8: 0 unrouted; (1443123) REAL time: 7 hrs 18 mins 53 secs
Phase 9: 0 unrouted; (1433735) REAL time: 7 hrs 19 mins 10 secs
PAR was _not_ happy. Took amost 4x longer to run. This is with the exact same source & control files.
9.2.4 Number of 4 input LUTs: 55604 out of 88192 63%
10.1.0 Number of 4 input LUTs: 54316 out of 88192 61%
Actually there is a 2.3% reduction in area. The number of slices is meaningless, it only tells you the LUTs are distributed. You might as well say you had 100% utilization because all four quadrants of the chips are used.
Any Slice with less than the maximum number of LUTs used still has space for more logic that can and will be used by the tools.
Always report LUT and DFF numbers and ignore the Slices.