Fastest ISE Compile PC?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Has anyone recently done any benchmarking of Windows PC's for Xilinx
ISE Compiles?

Is ISE multithreaded?
Can it use multiple processors (or cores)?
Do big CPU caches help?

Regards
Marc
P4-3GHz HT  2GB DDR2-533 RAM


Re: Fastest ISE Compile PC?

Quoted text here. Click to load it

I'm not sure if QuartusII or ISE is multithreaded but the first
generation dual core systems I wasn't impressed with. My experience is
my company provided me with a dual core system to do development work
with.

When my system wasn't living up to my expectations I did a little
research. The MS Windows performance meter and some third party tools
showed little activity. When it did it was about 85% or better,
occasionally pegged. I did some other poking around on my system and
discovered that they cheaped out with the graphics card and hard
drives.

My advice to you and this is partly experience and the other part gut
feeling is compare the price difference between the Extreme and Dual
Core chips if the price is negligible look for the one with faster
front side bus (FSB) speed. And, the second item I'd look into is a
caching SATA controller that supports mirror and some really fast hard
drives. Avoid striping the drives there is a performance hit but try
mirroring. From my observations, development tools I use are mostly
memory and hard drive bound. When you compile and PAR your design a
fast CPU is beneficial but it is also working with a lot supporting
files and storing/retrieving information from memory.

In the past most users report the biggest benefits from more and faster
memory.

Derek


FAQ: Re: Fastest ISE Compile PC?
Marc, why didn't you try to google for the answer? This question is
asked every other week on comp.arch.fpga. I only reply because I have
some new numbers.

marc_ely wrote:
Quoted text here. Click to load it

No, but I have for Quartus which is very similar.

Quoted text here. Click to load it

No and not for a while to come.

Quoted text here. Click to load it

Nope.


Oh yeah, but once you have that, core frequency is all that matters.

I recently went from an Athlon 64 2.0 GHz/1 MiB L2$ to a E6600 Core 2
Duo 2.4 GHz/4 MiB L2$. For my benchmark, the time for Synth/P&R went
from 12m34/33m40 to ~6m/~15m, thus more then double the P&R
performance. When overclocked to 3.3 GHz the result scaled to
5m54/11m12, thus 3X the P&R performance. Other experiments confirm that
it scales linearly with frequency (assuming memory scales equally).

I have expensive memory, but from my experiments the benchmark results
showed very little sensitivity to memory bandwidth and latency.

The 4 MiB Core 2 Duo is a very fast chip for FPGA work, probably the
fastest x86 available, but it's still not fast enough to reduce the
compilation times to an acceptable level.

Tommy


Re: FAQ: Re: Fastest ISE Compile PC?
Hi Tommy

Thanks for the info.  Yes I found your posts about 2mins after I sent
one out (after searching and finding nothing current).  That's the
problem with info on the web... it's often out of date and finding the
right stuff can be needle in haystack.

I think I will go for a CoreDuo with 4MB.

Marc

Quoted text here. Click to load it


Re: FAQ: Re: Fastest ISE Compile PC?

Quoted text here. Click to load it

Could you give us some info on what the disk subsystems  look like for
each machine? (ide, sata what speed, any raid? etc)

Quoted text here. Click to load it

What do you think explains for no change in synthesis for C2D change
from 2.4GHz to 3.3GHz ?

Re: FAQ: Re: Fastest ISE Compile PC?
Quoted text here. Click to load it

I could, but it would misleading as it's completely irrelevent to the
posted numbers. The benchmark is operating almost exclusively out of
the buffer cache and even then it's not reading that much data.

That said, for everything else disk latency matters a lot, so I used a
single SATA 150 GB Raptor (15,000 RPM) in the new box. The old box had
a quiet average speed Samsung PATA drive (7,200 RPM).

Quoted text here. Click to load it

My measurements were too informal. There is a change, just not as
substantial. I'd need to study this closer to understand what's going
on.

Tommy


New Quartus 6.1 is multi-threaded
Relevant to several recent threads, Altera just announced their Stratix
III and with it Quartus 6.1 of which the first bullet item is:

"Multiprocessor support:  Allowing parallel processing during
compilation for computers with multiple processors results in a
reduction in compile times. Quartus II software offers the first
multiprocessor support from an FPGA vendor to take advantage of the new
multiple-core processors."

The actual software is available *now* (according to the press
release). Trying to get it reveals that *now* is really December 4th
:-)

I look forward to see how it scales with multiple cores.

Tommy


Re: New Quartus 6.1 is multi-threaded
Hi Tommy,

Quoted text here. Click to load it

On two cores we've seen between 1.6X and 1.9X the performance
(depending on the algorithm) for the parallelized sections of code,
yielding up to a 20% compile time reduction.  Adding more cores gives
you big speed-ups on those portions of code -- but Amdahl's Law kicks
in pretty fast.  The remaining single-threaded algorithms become a
larger portion of the run-time as you add processors, diminishing the
overall returns.

FPGAs are getting bigger faster than CPUs are getting faster; this has
been true for a long time.  Without innovation in the software, compile
times would grow with each generation.  Thankfully, we've been able to
close this gap, and even improve our run-time (and memory consumption)
over time.  Multi-cores is just the next step in this evolution.
Modern CAD systems such as Quartus II contain numerous algorithms, all
of which contribute significantly to the run-time of the system.  Each
algorithm presents its own challenges for parallization (if that's a
word).  Over time as we parallelize more and more of the tool, the
benefits and scalibility will increase.

Memory consumption is also a challenge as FPGAs continue to scale in
size.  Keeping memory use in check yields many benefits -- cheaper
machines, sticking with 32-bit OSes, and better cache locality (and
hence run-time).  You'll find QII 6.1 (even for Stratix III) performs
well on this metric too.

Quoted text here. Click to load it

Customers can get the software today via their local Altera sales
representative or distributor sales office.  General/full availibility
is December 4th as you've indicated.

Regards,

Paul Leventis
Altera Corp.


Re: New Quartus 6.1 is multi-threaded

I came across the posting for the Stratix III the other day on their
website. Short of putting engineering samples in everybody hands, you'd
think they would want to coordinate the release of the new version of
Quartus II with the announcement for the new devices so that engineers
can see how their desings fair in the new software and devices.

I only had a few minutes to look at the website but the new devices
look like they have made them more granular and have doubled the
frequency of their devices.

I am a Quartus II user and my sales rep, Linda, has always done a good
job of getting me a copy of the software. So, I have one morew thing to
look forward to in December.

Derek



Tommy Thorn wrote:
Quoted text here. Click to load it


Re: Fastest ISE Compile PC?
My system has arrived and I did a quick benchmark:

my lab-system: P4, 2.6 GHz, 2GBytes RAM
another system: P4, 3 GHz, 2GBytes RAM
my new machine: Core 2 Duo, E6700, 2GBytes RAM
with Asus P5LD2 Deluxe


a full run with ISE 6.3 (from synthesize to bitgen)
with a recent design takes:

my lab-system: 30 minutes
another system: 28 minutes
my new machine: 14 minutes


I would say it is worth the money and I guess we'll
buy some more of those machines ...


bye,
Michael

Re: Fastest ISE Compile PC?
I took the plunge and built up a 2nd PC using a Core2Duo.

Here are the specs:
Old PC: P4 3GHz HT, 2GB DDR2-533 RAM, Gigabyte GA81915 mobo, stock
cooler
New PC: Core2Duo E6600, 2GB DDR2-800 RAM, ASUS P5B Mobo, ArcticFreezer7
cooler

Using a Spartan3 design running clean from scratch in ISE 8.2.3i
Old PC: 82mins
New PC: 35mins
New PC (overclocked to 3.2GHz):  25mins

I'm really pleased with the Core2Duo and would recommend it.

Marc


Re: Fastest ISE Compile PC?
Quoted text here. Click to load it




Conclusion dual cores (multiprocessor) benefits Xilinx ISE substantially?



Re: Fastest ISE Compile PC?
Quoted text here. Click to load it
No, cache size matters.... As far as I know, neither ISE nor Quartus use the
second core, but both benefit from the huge cache.

Thomas

www.entner-electronics.com



Re: Fastest ISE Compile PC?
Quoted text here. Click to load it

Not just regular L2 cache but the TLB or address cache matters even
more I suspect but harder to characterize and explain. When the data
set is still beyond even the bigger combined cache of a Dual, the
increase in associative ways of the bigger TLB kicks in to reduce the
incidence of the OS having to refill MMU page tables which can blow ns
cache hits into several 100ns accesses for full cache miss.

I ran a test on an older 2GHz Athlon XP2400 and a 2.6GHz D805 for a
loop that just randomly accesses ints from a 512MB array using a mask
to control the variability of address from 256 ints to the 128M max and
for each case run the loop 1M times.

I believe this represents the worst possible behaviour of any CAD
application that must traverse huge graphs or trees that can not fit
cache but easily fit DRAM.

The D805 generally runs 30% faster as the clock suggests while the
tests are entirely cache bound but the Athlon has 256K of L2 with 256
ways in the TLB. The 805 has 1MB of L2 in each core and I expect the
TLB has 1k ways of associativity. Only 1 core is used. I expect the
CoreDuo or 64b Athlons to perform somewhat better.

For in cache times the loop iterates in 7ns or 10ns resp for D805 v
xp2400. As the range of addresses increases past 64K the Athlon
staircases to 60ns then out around 2M degrades to 80ns-150ns and at
128M range settles at 400ns per iteration over the original 10ns or 40
times slower to crawl memory.

The D805 fairs some better, it tolerates another 2b of address but
degrades to 60ns at 256K level then reaches 130ns at the 128M level. In
other words when the L2 cache always misses, the D805 spends far less
time patching up the TLB and MMU page tables.

The D805 runs Windows2k with 1GB of DDR400 and the Athlon runs BeOS on
1GB of DDR266 but thats not real important.

Conclusion is that paying for bigger TLBs is probably far better than
more cpus since it just keeps the uni processor closer to its ideal
performance for codes that have poor locality of reference. Adding more
cores probably makes things worse as the quad core shows unless code is
really multithreaded.

John Jakson
transputer guy


Re: Fastest ISE Compile PC?
Quoted text here. Click to load it

I'm sure the second core will make a difference - while the one long
task is occupying one core, other minor tasks will run on the other
core.  While these other tasks might only take a tiny proportion of the
processor time, you avoid the penalties of task switching (like losing
your cache) on the working processor.



Re: Fastest ISE Compile PC?
On Mon, 06 Nov 2006 09:50:02 +0100, David Brown
Quoted text here. Click to load it

Assuming you set the thread affinity for the long task. If you observe
top on linux or task manager on windows xp, vista you will se that the
%99.9 cpu consuming task is being migrated from cpu to cpu quite
frequently. I am not sure why the scheduler of either OS does this.

Re: Fastest ISE Compile PC?
Quoted text here. Click to load it
I started using a Mac Pro a few weeks ago - Dual Core2Duo Xeons, 2GB RAM
running XP SP2. Although ISE isn't muti-threaded, I found a use for the
2nd processor yesterday - I ran a second instance of ISE. I'm working on
a multi-chip design, and I synthesized one project while routing a
second project. I set the affinity so that they executed on different
processors (at least I think they were on different processors). I
didn't benchmark the execution speed, but the time didn't seem out of line.



---
Joe Samson
Pixel Velocity

Re: Fastest ISE Compile PC?

Quoted text here. Click to load it

While the CoreDuo looks the thing right now, on the disk side I'd be
interested to know if the new IDE Flash drives that go up to 32GB are
any use as a replacement for high RPM drives.

The only reviews I have seen (Toms IIRC) obviously have much lower
latency but not yet much throughput around 30MBytes/sec but at least
the ms delays should now be us delays. At this stage I wouldn't be
concerned about wearout as I expect these things to be get replaced
sooner or later, prices seem to be falling on Flash much faster than
DRAM now and the throughput is bound to reach closer to PATA max rates.

just a thought
John Jakson


Site Timeline