I am compiling a design for the Acex EP1k50 using Quartus II version
The FPGA device is about 20% full, but I am finding compilation times are taking longer and longer (Currently 14 minutes)
They also seem to be increasing disproportionatley to the amount of extra functions added. For example, compilation was taking 7-8 minutes last week. Since then I have only added very few functions, but compile time has been creeping up on a daily basis to the current 14 minutes.
I have tried playing with the compiler settings under mode and fitting (as in "save more files for fast compilation") with little effect
One odd thing I noticed is that if I remove a few source modules, compilation time stays at the full 14 minutes.
It feels as though the compiler is getting bloated somehow, possibly by accumulating irrelevant data in some files somewhere.
Is there anything I can do to alleviate this?
I don't think the problem has much to do with the design itself, which is fully synchronous. Compilations for similar projects in the past have only taken 3-4 minutes.
Ted, If you can do send me the two archives of the design. The one which compiles in the 7-8 minute range and the other one which has the addiitonal logic added, and causes increased compilation time. We would like to analyze the design.
Is there any chance a 64bit version of Quartus will be released? I got my 15+ min. builds down to 5+ by upgrading to the fastest available cpu, but I was thinking this process might benefit from the new 64 bit AMD and upcoming Intel procs.
We'll release a 64-bit version of Quartus when one is needed in order to address more memory. Windows XP maxes out at 3 GB (with a command-line flag), and Linux maxes out at ~3.7 GB. In all our testing of Stratix II, I don't think I've seen any compile (even on a 2S180) that required more than
2 GB of memory, and most compiles require much less than this -- and we get some very stressful, pathological designs from our internal test/product engineering groups. Also, as memory needs keep increasing due to larger and larger chips, we strive to beat down on the memory requirements of Quartus. In addition, architectural changes in Stratix II result in reduced memory consumption compared to the same design compiled in Stratix.
So there is still plenty of room left before we _have_ to go 64-bit -- and we will be ready when the market needs a 64-bit version of Quartus.
Contrary to popular belief, the "bitness" of a processor does not (necessarily) equate with speed. The primary advantage of switching to
64-bit computing is the increase in (easily) addressable memory that you get with 64-bit pointers. You are limited in 32-bit machines to 4 GB of addressable memory (minus the up to 2 GB of address space the OS reserves). There are tricks to get this up higher, but aren't fun to program to.
From a processing speed perspective, switching to 64-bit code may help and may hinder things. It can help in those cases where you would have had to break up something that wanted a > 32-bit representation across multiple integers. But most things fit fine in 32 bits, and promoting those variables to 64 bits just pollutes your data cache, effectively reducing the amount of cache memory available to the processor. And 64-bit code can be larger (because of larger pointers, instruction extensions, larger data values, etc.) thus chewing up valuable instruction/trace cache room. Luckily, recompiling C code to 64-bits does not change your integers to
64-bits -- just your pointers -- but this still has some impact on data and instruction cache consumption. I am glossing over many pros and cons of
32/64-bit code, but you get the idea.
Intel plans to ramp the Prescott (90 nm version of P4) core up to 4 Ghz+, and AMD will be making Opteron/Athlon64 for years to come. As they up the speed, you will get more performance on your 32-bit applications. And you can bet future versions of the processors will support fast 32-bit processing, since it will take a LONG time before many programs make the switch to 64-bit.
If you are interested in some early benchmarks comparing 32-bit and 64-bit x86 performance using a beta of Windows XP 64-bit, see
Another bit of performance data:
Do a search on results from "AMD Athlon" and click through to find results that were on the same machine, one running 32-bit windows, the other 64-bit SuSE Linux w/64-bit gcc compilation. On the few machines I looked at, the SPEC level was ~5% better on 32-bit. If you look at the "VPR" component of the SPECint test, this is an academic FPGA place and route tool, and it too yields something ~7-8% less speed on 64-bit x86. Of course, there could be immaturity of compilers, different OS efficiencies, etc. in here, but 64-bit will be no silver bullet when it comes to performance.
There are a few benifits of the AMD-64 architecture beyond the 64-bit width (in general I agree with most of what you've written here - it's a good explanation). In particular, the larger number of registers is a help in many types of application. Also, for some types of application, convenient
64-bit data items can lead to other benifits - for example, povray runs slightly faster in 64-bit mode than 32-bit mode on an Athlon-64, but more importantly it runs more accurately, giving finer detail. I don't know whether this could apply to tools like Quartus (povray deals with approximations to reality rather than absolute logic), but perhaps it might have benifits for simulation.
Of course, for a real break-through in compilation speeds the key would be effective multi-threading, but I understand that that's a bit difficult for current algorithms.
Will Quartus II 4.x *run in 32-bit mode* on an AMD64 based machine?
I use Opteron based systems due to their excellent performance, even on 32-bit applications. However, Quartus II 3.x does not run since a silly csh based driver script failes to detect the architecture (if the script simply *tried* to run the X86 binary I would probably work):
Thanks for the reply. I guess I was thinking more along the lines of processing the data in 64 vs. 32 bit chunks as opposed to increasing address space.
I guess I know little of the data structures involved in synthesis and fitting. I do image processing where twice the bit width means twice the number of pixels per fetch etc. This results in a near linear increase in processing speed for a given clock rate.
Of course I have to go in and tweak my inner loops to operate on the larger chunks, but well worth the payoff.
Many computer algorithms don't really fall into the category of processing "chunks" of data -- CAD algorithms included. It's more like "look at a piece of data. If it's this, go do that. If not, go do this other thing and multiply it by some other piece of data located over where this other thing points to...".
If we wanted to (say) perform arithmetic on multiple 16- or 32-bit values at a time, this is different from 64-bit computing, and is known as SIMD (Single Instruction Multiple Data). MMX & SSE are both examples of SIMD instruction extensions to x86, and they exist in current processors. I believe that some compilers (Intel's, I think) will automatically vectorize pieces of code that are handling small values one at a time and convert them to using MMX/SSE instructions to handle multiple pieces of data concurrently.
Some applications such as media encoding/decoding, graphics processing, etc. which really are just doing big array manipulations or a lot of math do get a huge benefit from performing the same operations on multiple pieces of data in parallel. But CAD algorithms are totally different beasts.
Good point -- I forgot to mention the doubling of the integer and SSE register files. And moving to other 64-bit platforms could bring even more architectural advantages (x86-64 still doesn't have that many registers available), though its sounding like x86-64 is going to be the primary
64-bit architecture (Intel has indicated they plan to release a 64-bit x86-based chip too, though aren't saying if it is same instruction set as Athlon64).
I don't think x86-64 brings any additional accuracy to floating-point based code. x87 always had 32-bit and 64-bit floating point available, and internally operates on 80-bit floating point numbers for increased accuracy especially for things like sin(x) function it supports. Intel now encourages programmers (and more importantly, compilers) to use SSE/SSE2 for general floating point computation and hence x86-64 brings no update to the older stack-based floating-point unit. Instead, it adds another 8 128-bit SSE registers to bring the total up to 16. Floating point representations continue to be 64-bit double precesion in x86-64.
One way that the move to 64-bit integers can result in improved accuracy is when programs employ fixed-point representations. For example, say I know that some value will vary from 0 to 1. I could represent that value as a
32-bit integer "x", and implicitly know that the actual number is x / 2^32. Programmers used to do this sort of thing a lot back in the 386/486 days when integer operations were significantly faster than floating-point ops. This is less true than it used to be since integer multiplication/division is now handled by the same unit that does floating-point multiplies (well, except in the 90 nm version of the P4). But still there could be some advantage since processors have more integer ALUs than floating-point/SSE ALUs and are generally geared towards processing integer data faster than floating-point data, especially for addition/subtraction/shift operations (since they are needed for addressing in addition to integer math). On the other hand, if you turn all your floating-point into fixed-point, then your floating-point units go unused -- if you kept operations in floating-point, then you can get parallelism inside the cpu with it using integer ALUs at same time as FPU. So the net effect of using fixed-point these days is unclear to me.
Anyway, I digress. After digging around a bit in google, it looks to me like povray uses floating-point representations, so I'm not sure why it would be more accurate. Do you have a link I could follow that claims increased accuracy?
As for Quartus -- we don't have many cases of approximations which would benefit from increased accuracy. Place and route is a mix of integer and floating point code -- but most floating-point calculations don't need to be terribly precise. Floating point is used (for example) when coming up with a "score" for a given placement choice or to express how badly a signal wants to use a wire during routing. We rarely even need double-precision, since by their nature optimization decisions don't need to be super-precise since you're really just trying to figure out which resource/configuration is better than another. If we got to the point that our cost functions were so good at predicting the right configuration that double-precision round-off was detrimintaly affecting our optimization decisions, I think the CAD optimization problem would be solved :-)
For things like post p&r simulation and delay annotation, the accuracy provided by greater-than-double-precision would not be needed since we can't hope to ever model the exact workings the chip to that fine level of accuracy anyway. If we are ps accurate, that's pretty good -- and compared to a critical path of (say) 100 Mhz, that's 1/10000 accuracy, which is easily handled by single-point precision. Of course, I'm glossing over other benefits of increased precisions, such as reducing accumulation of error when adding up many small numbers (like in a post p&r timing sim), but still I doubt double-precision loses steam...
Theories abound on that one. But for practical purposes, AMD's x86-64 is available here and now on sensibly-priced processors, and is thus a far better target than Intel's 64-bit x86 extensions (which are currently unspecified vapourware) or the Itanium (which costs a great deal more, and is slower than the Athlon/Opteron for integer work).
x86-64 does not change the accuracy of floating-point work, as far as I know, but as you say it would make a difference in fixed point work (with appropriate source code). The article mentioning povray accuracy is at:
I don't know anything about why povray is more accurate on the Opteron beyond what is in that article.
Having freely available 64-bit integer arithmetic would allow you to do these things in integers rather than floating point, which could improve spe ed and accuracy. In particular, they would let you hold your times in ps, and have as long delays as you want without having to worry about overflows or ranges. Whether it would be worth the effort or not, I have no idea.