a clueless bloke tells Xilinx to get a move on

Brannon · 2006-10-05T19:23:03+00:00

The following is an informal letter to Xilinx requesting theircontinued efforts to increase the speed of their software tools. Ifthere are incorrect or missing statements, please correct me!Dear Xilinx:As many of us spend numerous hours of our life waiting forMap/Par/Bitgen to finish, I hereby petition Xilinx, Inc., to considerthis issue (of their tool speed) to be of the highest priority. I amnow scared to purchase newer chips because I fear that their increasedsize and complexity will only further delay my company's developmenttimes. Please, please, please invest the time and money to make thetools execute faster.Have you considered the following ideas for speeding up the process?1. The largest benefit to speed would be obtained through making thetools multithreaded. Upcoming multi-core processors will soon beavailable on all workstation systems. What is it that is causing Xilinxyears on end to make their tools multithreaded? There is no excuse forthis. I assume the tools are written in C/C++. Cross platform C/C++threading libraries make thread management and synchronization easy(see 2. Use a different algorithm. I understand that the tools currentlyrely on simulated Annealing algorithms for placement and routing. Thisis probably a fine method historically, but we are arriving at thepoint where all paths are constrained and the paths are complex (notjust vast in number). If there is no value in approximation, then thealgorithm loses its value. Perhaps it is time to consider a Branch andBound algorithm instead. This has the advantage of being easilythreadable.3. SIMD instructions are available on most modern processors. Are wetaking full advantage of them? MMX, SSE1/2/3/4, etc.4. Modern compilers have much improved memory management andcompilation over those of previous years. Also, the underlyinglibraries for memory management and file IO can have a huge impact onspeed. Which compiler are you using? Which libraries are you using?Have you tried the latest...

A

Austin Lesea 19 years ago

mk,

How well does it deal with memory?

Can you place 16 or 32 Gbytes in the box?

Austin

mk wrote:

Vote

F

Falk Brunner 19 years ago

Austin Lesea schrieb:

Much more important question.

How mature is it? The Xilinx software itself makes trouble enough. Adding a very beta stage operating system will bring a lot of "fun" to the software guys.

Regards Falk

P.S. Woh a bit the suggested partioning. Incremental design is a must at this level of size/complexity. You dnot take a big staircase in one step, dont you? Unless you fall down ;-)

Vote

M

mk 19 years ago

Nope, when you install vista x64, 4 of the 8 2G dimms pop out of the computer... Seriously though, in my experience it's no different than linux. If you have a two cpu socket machine with 8 dimms, you can make a quad core machine with 16G and all of it is available to the 64 bit processes.

Vote

M

mk 19 years ago

As I have mentioned vista is the second version of 64 bit windows. It's stable enough to run ISE 8.2 to run to completion where it wouldn't on win32 because there it gets only 2G address space whereas on win64, 32 bit binaries get the full 4G address space.

Vote

F

fpga_toys 19 years ago

That says a lot about the required computing past Virtex-5, as things scale up another factor of two or four. I assume the total cpu cycles required for P&R are scaling even faster, making both memory and total instruction counts both serious bottlenecks.

I assume you are supporting both Itanium IA64 and AMD _x64 architectures? Desktop 64bit machines this size aren't exactly plentiful, server class machines are a bit more plentiful, but still frequently pretty expensive. I have two quad, and three dual Itanium Linux servers here in an MPI cluster, each with at least 8GB. Plus a large MPI/PVM cluster farm of Intel P3 and P4 machines, but I suspect that's pretty rare in this readership given the stiff costs of building HPC clusters. None of them are easy to work near, as the required airflow makes them very noisy individually, and aggregated into a server room.

With 8GB data sets, many algorithms fail to scale since there isn't enough data locality to make effective use of either L2 or L3 caches which are at most in the 9MB range. Relatively random access to an 8GB data set generally brings the processor to a grinding halt on memory, with a net instruction rate about 1-2% of L1 cache performance. Generally applications which have out grown 32 bit cpu caches, require nearly a complete rewrite with new algorithms and new data structures to gain enough locality to get effective/good memory performance in a large address space 64 bit machine.

Almost always a major restructuring based on some flavor of divide and conquer is required to bring the memory footprint back inside L2/L3 caches. Generally this requires re-writing most, or all of the C++ code back into normal C to get rid of the dynamic allocation, randomly allocated link lists, and other data structure changes necessary to manage memory foot print and cache performance. Redesign for SMP and Clusters becomes critical, as more CPU/Caches become available to concurrently process a larger active working set than a fast cpu/core with a large cache can handle alone.

Once your people start considering divide and conquer attacks to split the operation up for SMP threads or MPI/PVM clusters, it's certainly worth taking a better look at the problem for partioning it to run on a N-WAY SMP 32bit machine AND clusters too. Most of the newer high end 32 bit Intel processors will also handle 8+ GB in a machine, but only 2GB per process under Linux. Using divide and conquer the application should easily function as a 4-8 process application with some modest restructuring. With multiple cores/cpus now typical in high end SMP 32 bit machines, this should be Xilinix's primary strategy for a target host environment.

I'm suprised that Xilinx didn't realize this before now, and roll out a linux SMP/cluster version of it's tools before getting backed into the

64 bit corner.

Sometime the "we've always done it that way, it's the only way" problem becomes severe, as limited resources prevent considering solutions requiring a major investment ... IE a major rewrite with new algorithms and internal architectures.

Data set and algorithmic scaling into HPC class facilities isn't a typical skill set for average programmers. Maybe one in a few thousand programmers have experience at this level, and probably fewer with EDA experience.

There are probably far more open source programmers with experience at this level, than there are Xilinx EDA programmers that are comfortable architecting HPC SMP/Cluster software solutions. Many of the same folks have interest in high end reconfigurable computing too ... which seriously needs tools capable of doing P&R in near real-time.

And then, there is always the NIH problem ....

Vote

F

fpga_toys 19 years ago

This is not just a Xilinx problem. Across the industry slow poor tools have resulted from tight fisted IP policies regarding FPGA internal design and routing data bases to build bit streams. High cost, low performance.

And it means the tools are limited by the creativity and (lack of) experience of the vendors in house tools programmers. A mix of NIH and paranioa over disclosure are self defeating in selling FPGA chips in high volume. We hear complaints by the vendor that they don't have unlimited resources and must focus on selected key customer needs (AKA high volume customers demands). This same lack of resources has prevented innovative redesign of the tools to take advantage of multicore processors and cluster technologies.

More importantly, the vendor doesn't have a broad systems view of their own products, and has failed to capitalize on building low cost design systems which are representive of the very market they are feeding .... FPGA centric designs. Consider that a well executed motherboard built around multiple FPGA's with PPC CPU cores could easily have far more place and route performance than any equivalently priced PC workstation by using the FPGA's as high speed parallel coprocessing routing engines. This isn't a new idea .... see

formatting link

That they block both 3rd parties and open source from having access to the FPGA internals and tools internals means their customers are limited to what tools their limited resource development teams can cobble togather.

With a more open disclosure, it would be interesting to see what both open source and for-profit 3rd parties could do to make a market out of providing high performance FPGA tools and integrated development systems with FPGA assisted routing.

Vote

F

fpga_toys 19 years ago

As a side note, it's not either/or between being a hardware or software company.

Most major Open Source products are staffed with paid developers from multiple supporting For-Profit companies to leverage industry development dollars as far as possible. Linux exists as a viable commercial product because of hundreds of millions of dollars in salaries paid by many (MANY) large hardware and software corporations to develop the product.

They did this to get out of the other extreme, which is everyone having a mediocre product due to limited development dolars because everyone was reinventing the same wheel, and claiming theirs was somehow better. A For-Profit project (UNIX) changed that model, and had everyone supporting a common UNIX development goal, which over time, out grew UNIX and became OpenSource in a number of UNIX clone forms.

Pooling paid labor from both FPGA/PLD companies, and major end user companies with inhouse EDA programmers, plus educational and volunteer labor does over time generate a better product. Mostly because of the professional paid developers that are mutually committed to making it the best for THIER companies use and sale.

Vote

M

Mark McDougall 19 years ago

To this day, I'm flabbergasted that silicon vendors actually charge money for their tools!?! If you want to push your silicon, then wouldn't giving away tools only serve to sell more devices???

I'm *sure* more than one decision on Altera vs Xilinx has been made purely on the cost and strength (or otherwise) of the tools!?!

As for open-source, I'd love to see it myself, but it would be a vendor's nightmare! "My design simulates fine but doesn't work in silicon... I'm using the red-hat disto of Quartus II v7.4.5 patch level

12 with the turbo annealing enhancements v2.1b4. Oh, and I found a bug so I patched the timing analyzer myself too..."

Regards,

Mark McDougall, Engineer Virtual Logic Pty Ltd, 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266

Vote

F

fpga_toys 19 years ago

It's not any different for high end server companies like IBM, HP, SGI, .... etc where stability, crash free, operation are CRITCAL operational points for million dollar hardware sales.

There is a reason each of these companies have legions of developers supporting the open source products on salary.

On the other hand, if the vendors had absolutely crash free reliable products that were feature rich and fast that met everyones needs, there probably wouldn't be a discussion.

So, the reliability arguement I believe is a red herring. IBM, SGI, HP, RedHat all ship stable open source products, which some argue are significantly more stable and secure than proprietary alternatives in the same high reliability server market, like Microsoft. It's probably baseless to believe that any vendor would allow their inhouse programming team supporting the open source EDA tools to have any less of a quality initiative than the tools they might augment or replace.

With a much wider user base, and multiple vendor support, one would expect that the broader testing and broader developer base to actually produce better tools, better tested, and stable in comparison to cash strapped proprietary efforts. The common parts of the product should reach much better maturity and stability. As for vendor specific parts, that's no different than today, since the vendors in house team will be pretty much solely responsible for their chip support. Just as we see IBM, SGI, HP, etc ... all solely responsibly for their systems architecture and device driver code.

Vote

S

Simon 19 years ago

Well then, it's about time you ported it to the Mac, isn't it ? A full

64-bit OS, with quad top-of-the-range processors on the desktop machine and gobs of RAM using this new weirdo serial-network RAM interface...

Given that you've got a Linux version (which uses X), and an X server on the Mac (which also runs a pretty plain-vanilla Unix for its OS), the only real barrier ought to be the QA overhead... Perhaps not

*quite* as simple as typing 'make', but almost certainly within the confines of an intern's summer job :)

Well ?

Simon :)

Vote

a clueless bloke tells Xilinx to get a move on

Join the Discussion

Didn't find your answer?