Choice of FPGA device

- V
- Varun Jindal
  
  Contact options for registered users
posted
19 years ago

Wed, Nov 24, 2004 6:25 AM

hello all,

i have been reading a lot about performance comparisons between leading FPGA chip makers on hteir web-sites. both claim improvement upon the other by metrics of 20 - 40 % .... though none has ever described what exactly was compared.

are there resources available on the net, which compare different architecture in detail (and also impartially) .. !! ??

-- Varun.

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Nov 24, 2004 7:40 AM

Brand A and X are roughly equivalent. Use the vendor that gives you the best service and distribution.

-- Mike Treseler

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Nov 24, 2004 3:56 PM

Varun,

Your best bet is to contact the FAE (Field Applications Engineers) for both campanies, and have them explain exactly what their claims are based on.

What speed grades were compared (e.g. their fastest with our mid-grade)?

What were the settings of the synthesis tools (e.g. their default vs our default -- we default for speed of synthesis, theirs for a compromise of performance)?

What effort was made to use device specific features (e.g. theirs a lot, ours a little)?

What choice of device was made (e.g. their only one choice, versus our three options to best fit: LX for logic, SX for DSP, and FX for networking and comms)?

Or, you could do like the other posters' suggest: IGNORE IT and do your own benchmark by examining specifications and trying out some intended critical logic, and/or examining the offering of IP from each company (and its perfomance).

Who is to say which device is 'better'? Only after careful study, and use of specific features that may offer an improvement can one make a decision. And that decision only holds for that one (type of) design!

The "speed superiority" claims appeared three days after we announced the availability of three V4 parts as engineering samples.....compared to their unavailability. Hey it ain't fun when your foundry can't supply the parts to you, is it?

Our 90nm offerings are succeeding because we did engage early with our fab partners, and did shake the process out. If you wait until the process is stable, you will wait forever. If you don't want the process, you are dependent on other larger customers of the fab.....and maybe they are making 130nm ASICs and are perfectly happy to wait until someone else has paid for the 90nm wafer starts to shake out the new process. And who will use the triple oxide process for reduce leakage and power on currents? No one but an FPGA vendor. No process, no performance.

Our fabs like us for our willingness to be full partners in the development of a new advanced process. I think our customers understand that sometimes there will be rough spots in a new introduction of a new product on a new process, but overall we continue to offer superior products (in my opinion).

Austin

Varun J> hello all,

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Nov 24, 2004 6:10 PM

There are different metrics, and each design has different needs.

Even so, it is possible for each to be 20-40% improved over the other. That is why geometric mean is preferred for benchmarks.

Say you have two benchmark programs. Machine A runs the first in 1 minute, the second in two. Machine B runs the first in two seconds, the second in one.

Machine A runs the first 2 times as fast and the second 0.5 times as fast, the average then is (2+0.5)/2 or 1.25 so machine A runs, on the average, 1.25 times as fast as machine B.

If you do it the other way, you find machine B is 1.25 times as fast as machine A.

Be very careful when you read benchmark numbers, and always use geometric mean.

-- glen

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Nov 24, 2004 8:22 PM

What? Did you mix up minutes and seconds in there? Don't you just add the times to see which one is quicker? And take into account which type of program you run most often? Cheers, Syms.

- D
- Dave Greenfield
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Nov 25, 2004 12:58 AM

Varun, Altera's benchmarking methodology is documented at

formatting link

In particular I'd suggest looking at the benchmarking methodology white paper at

formatting link

which articulates our exact methodology. I believe that this clear description of a benchmarking methodology is unique.

Altera benchmarking is done by our engineering group which uses these results to optimize new architectures, to improve place and route algorithms, and to improve synthesis results. Marketing uses these results in a peripheral manner (i.e. they are not run by marketing and they are not run for marketing).

If you have further interest, Altera will be hosting a net seminar describing this benchmarking methodology, specifying the results, and explaining architectural differences that facilitate the significant performance advantages. Details on the net seminar are found at

formatting link

[note - there will be marketing participation in this net seminar].

Altera has been shipping Stratix II devices since June and began shipping the Stratix II 2S130 device (biggest FPGA ever shipped by

50%) last week. The comment on unavailability is misguided.

Dave Greenfield Altera Marketing

- V
- Varun Jindal
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Nov 25, 2004 4:36 AM

As already discussed by Glen, WHAT and HOW the comparison is done is of core importance.

Again, Dave, the link provided by you does highlight the methodology of benchmarking. Discussion regarding the use of sub-optimal benchmark designs to generate mis-leading comparisons is very true. But, having discussed this, there is no opinion on the issue of incorrect/incomplete choice of performance ratio. What if the comparison parameters are biased against an architecture.

Even if non-disclosure of benchmark designs is valid, what about the performance ratio !? My question is what is the equation of your performance ratio!? Why is it so difficult to disclose this equation. A user holds a right to know what importance you have given to each parameter in order to calculate the performance ratio.

i will give you a small example, due to limited funding, the choice of which FPGA device to purchase is heavily governed by the size of chip which can implement my design. But i am not aware whether you have taken this point into account while calculating teh performance ratio. In case you have, what weightable has been given to it !? All this is still black box to me. How do you or anybody else for that matter expect me to rely on the comparison results provided on the web-sites.!?

Can different people have different reasons for buying a FPGA device!? How can 'chip size'-based decisions or 'chip performance'-based decisions be made from one set of performance ratio!?

-- Varun.

- P
- Paul Leventis (at home)
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Nov 25, 2004 5:42 AM

Hi Varun,

I think that the simple answer is that a purchase for a single design cannot be made purely off of general benchmarking results. You need to evaluate the performance of our chips (and any others) for your design and its requirements. And you need to factor in other chip features and performance parameters, the price you can get from your distributor/fae, the packaging choices, device availability, etc.

Let's step away from questions of benchmarking validity, averaging methods and such. In the end, we get a spread of results. If your design happens to be one of the designs that experiences equivalent performance (or say you are the data point at the extreme left in Figure 1 at

formatting link

then our 39% means nothing to you.

All benchmarking results do is provide you with some guidance of what to expect. Based on our Stratix II benchmarking results, you can expect a chip that will likely outperform Virtex-4. This could mean that you hit your performance target in one and not the other. Or it could mean that you can buy a cheaper speed grade in Stratix II but need a more expensive speed grade in Virtex-4. Similarly, you can expect Cyclone II to be ~60% faster than Spartan-3.

If you only have time to try one chip, I think it should be Stratix II or Cyclone/Cyclone II (depending on your needs), given the average results we see. If you have time to try two chips, I still think you should just buy ours ;-) -- but I will grudgingly accept that you will probably try out what Xilinx has to offer too :-)

Does that mean Stratix II is the right chip for you? Not necessarily.

- P
- Paul Leventis (at home)
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Nov 25, 2004 5:43 AM

Hi Austin,

I believe Dave has addressed the overall question on benchmarking methodology. I'd like to address a few specific benchmarking questions in your posting (which I believe are addressed in the links David provided).

We always produce at a comparison of the fastest speed grades available in the software, and we will sometimes publish other comparisons with explicit indications of speed grade. Our philosophy is that if a speed grade is not in software, users can't design to it, and thus it is not real. When a new speed grade becomes available (from either vendor), we re-measure our benchmarks.

Note that sometimes speed grades appear in the software that are difficult/impossible to actually get from the vendor. We do not factor this into our benchmarking results. This is can be an advantage for competitors, since we haven't had a speed grade availability issue that I know of.

We do apples to apples comparisons. We usually use a 3rd party synthesis tool, same version, same settings for both chips. If we are trying to compare architecture speeds, we push synthesis for speed (for both architectures).

We also sometimes publish results using the available integrated sythesis. This is particularly relevant in the low-cost market where CAD tool costs can be a factor. If we are comparing architectural speed, we select the settings (for both tools) that yield the best speed results. We do not cripple either tool, and we go as far as running many experiments to try to determine the best settings for our competitors' tools.

We make a fairly signficant effort. We do not go as far as rearchitecting the design to be specifically optimised for a chip. We try to standardize the HDL between the target architectures, with exception of "cores" such as explicitly instantiated memories, multiplier/accumulator blocks, PLL/DCMs, etc.

Does this mean for a given design we've extracted the most we can? I'd say no, since that would require an enormous engineering effort on (typically) poorly documented designs (all we get is the HDL, and sometimes it's been anonymized). But our benchmark set comprises designs that were originally targeted to our chips of current and past generations, competitors chips, ASICs, vanilla HDL, etc, so there will probably be headroom left in both architectures under comparison.

We select the smallest device that fits the design, since we believe that our customers would likely do so as well. The whole Virtex-4 alphabet soup issue is new. But since only LX is available, its moot for now -- no point comparing to a family that is not available.

As for FX, that's a non-issue as we're talking about core fabric performance. Stratix GX and your FX parts offer additional hard IP that will factor into some customer's decisions, and probably in a way that no amount of benchmarking will be able to quantify.

I'm not quite sure which three days you are referring to, but the primary reason for the timing of our release was availablity of ISE support for Virtex-4. We can't benchmark against a chip that doesn't exist in the software. If we knew we had a 39% performance advantage earlier, do you think we would have sat on it?

I'm not sure what its like when a foundry can't supply us parts Austin, so I can't feel your pain. Sorry. We have one fabulous fab partner in TSMC, and it's the only one we need.

Regards,

Paul Leventis Altera Corp.

- A
- austin
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Nov 25, 2004 7:58 PM

Paul,

Thanks for the reply.

I disagree with pretty much everything you say, but you are cvertainly entitled to offer a defense.

Thanks for admitting that you did not compare similar speed grade parts.

Aust> Hi Austin,

- S
- Stifler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Nov 25, 2004 11:28 PM

Your claims of 39% performance advantage for Stratix II over V4 are a lie. I don't believe a word of it. You don't give any proof for your wild claims.

My guess is that here is how it works.

Step 1. Grab a bunch of customer designs submitted to correct the thousands of bugs contained in your software.

Step 2. Software engineers fine tune all your fitting and routing algorithms to make the best possible performance on these designs.

Step 3. Run these designs again and again with different fitting and routing settings to zero in on the best result possible.

Step 4. After these designs are fine tuned for Altera p&r, run the design through the Xilinx software one time with no special efforts to get good results.

Step 5. Pick V4 devices that are of a slower speed grade than your Stratix II device. Fastest SII vs. middle V4.

Step 6. Lie about how great your benchmarking methodology is.

Step 7. Trot out a Ph.D. professor at the U of T that I believe was made a millionaire by Altera when they bought his company to say how fair and great you are.

Step 8. Have a web seminar on Dec. 7 with same Altera made millionaire to try and get people to believe your lies.

Eric Cleage is gone. I would have expected you to become more honest in your marketing efforts after that.

formatting link

- P
- Paul Leventis (at home)
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Nov 26, 2004 1:29 AM

Hi Stifler,

A pleasure hearing from you again. I probably should ignore your entire post, but here I go...

We expend considerable effort to get the best results we can out of Xilinx P&R. I know that's hard to believe, but its what we do.

Please tell me how to select a faster V-4 device in the software and we'll benchmark against it. If you don't like the comparison, derate our result by the speed difference between -3 and -4 Stratix II devices. We will still be winning by ~25%. But I think the comparison is valid as it stands; if you need an FPGA with the most speed, the "fast" speed grade for V4 is not an option given that it isn't shipping nor is it in the software.

Dr. Brown is a Senior Director in our Toronto office and oversees (amoung other things) benchmarking and performance analysis. He is also a professor at the University of Toronto who's done a lot of fundamental work in FPGA research. He also wrote a pretty good textbook on digital design. He's good at a lot of things, but lying really isn't one of them.

Steve was not part of any company acquired by Altera; we have more than one U of T professor working for us.

Regards,

Paul Leventis Altera Corp.

- P
- Paul Leventis (at home)
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Nov 26, 2004 1:39 AM

Hi Glen,

For your reference, the +39% Stratix II vs. Virtex-4 benchmark result uses geometric mean. The A vs. B and B vs. A result you give is one motivation. Another is that if you use arithmetic average, a large outlier will heavily skew results, whereas in geometric mean, the impact is less.

Paul Leventis Altera Corp.

- N
- Nick
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Nov 26, 2004 11:24 AM

I thought I'd give my 2 cents on this matter.

Altera and Xilinx both make excellent products. They make state of the art VLSI circuits, using technologies other manufacturers dream of.

90nm being one of them.

As with respect on who's the best, that's very hard to say. My suggestion is, don't listen to Austin or Paul. Or, for that matter anyone from Altera or Xilinx. They are great people, and always help out whenever they can on this newsgroup. But, let's face it, they're a bit biased.

If Austin said Altera is better than Xiinx, he'd be admitting he's no good at what he does. Same goes for Paul.

As for benchmarks, I believe what Altera says. Or most of it. What they can do is what we all do, and that is compare devices that are available on the market. And, at this point in time, the Stratix II is the best FPGA on the market. So, if you need it today, go out and buy a Stratix II.

Having said that, the V4, once in full production, will probably be a better device. But by that time Altera might have another device in the pipeline, almost ready to go out.

So there you have it. It's a pedulum effect. Right now, go for Stratix II. At some point in the future, the advantage will Xilinx's. And at some other point, it will swing back to Altera.

The important thing is, of course, that Xilinx and Altera will keep on making excellent devices, all the better for us users!!

Have a nice day every>Varun,

- S
- Simon Peacock
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Nov 27, 2004 4:06 AM

I would like to second Nick's opinion... The key thing to remember is that Altera and Xilinx are competitors... so of course each's product is better than the other and benchmarks are just that... 'bench' marks.. Altera and Intel have been playing the smoke and mirrors of benchmarking for years.... even MS is getting into the act against Linux ... There are pros and cons for both implementations but I think the most important is the software used to create the FPGA. That's what it boils down too in the long run.. There's an old saying... if you don't like the weather in Florida wait a few minutes and it'll change.. same is true for FPGA's. Unless you need bleeding edge either manufacturer will do... I/O is another thing.. Xilinx seems to support LVDS better than Altera... (especially BUS LVDS).. Also the way you write code will affect the benchmark to some degree too...

So my advice to someone is write your code using no built in libraries.. then compile using free tools on both... look at the speeds, delays etc... is it what you want ? is it everything you need ? Do both manufacturers work ? Then throw that out and ask your Xilinx and Altera agents who will give the best price ?? We had a 30% discount for not using Altera in our design :-) When it all comes down to it its not who's better... but if they both work... which gives you the best profit .. its no good having a product that's too expensive to sell!

A somewhat sceptical approach.

Simon

"Nick" wrote in

- A
- austin
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Nov 27, 2004 4:42 AM

Nick,

Hmmm.

We introduced V4 before SII (as in announced shipments to customers).

We shipped thousands of V4 ES parts, more than S2.

We shipped both LX60, LX25, and SX35 (three devices from the family).

We are getting ready to ship all the others.

In production on 90nm a full year ahead (Spartan 3).

So how are they 'ahead' this time?

One family, no processor(s), no MGTs, no built in FIFOs, ....

Too many deficiencies to name.

But, you are right, don't take my word for it. How could I possibly be fair?

I'll leave the marketing to the marketeers.

But when I see patently false statements, I will respond.

Austin

Nick wrote:

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Nov 27, 2004 7:41 AM

Anybody have a rule-of-thumb on how much performance you give up if you don't tweak your code to take advantage of a vendor's features/quirks? (both software and hardware)

Are the free tools appropriate if we are discussing volumes big enough to be worth arguing about?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Nov 27, 2004 9:18 AM

Mike Treseler wrote:

Yes, start with your own functional synthesis code and compare utilization and static timing for all devices under consideration.

That's also a good way to design portable code. Synthesis > Anybody have a rule-of-thumb on how much performance you give up if

My rule is try it and see. It doesn't take long to run static timing to get the exact answer for the actual routed design.

Sometimes I can catch the synth on small inefficiencies, but then it often kicks my rear on bigger things. And if I spend even one minute fussing with each of

50,000 logic cells, I'd probably be brewing mocha lattes today.

-- Mike Treseler

- P
- Paul Leventis (at home)
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Nov 27, 2004 4:29 PM

Hi Hal,

The Quartus II Web Edition is almost fully-featured when it comes to push-button place & route. Web Edition provides all synthesis & fitter options intended for optimizing design performance and/or area, with the exception of the physical synthesis options. Physical synthesis can be a huge boost on many designs (10-15%?), so its ommision is a bit of a downside to using the Web Edition. It may be included in a future edition of the product.

The other issue is device support. If you require a large Stratix II device, they are not shipped with the Web Edition, so you will not be able to evaluate those devices.

Regards,

Paul Leventis Altera Corp.

- N
- Nicholas Weaver
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Nov 27, 2004 5:24 PM

Placement and careful pipelining are big wins. My AES design of a few years ago was about 1/2 the size and 2x the performance of the academic synthesized versions because I hand-mapped to the Virtex E/Spartan II family including placement.

Placement alone can be a good 20%+ performance win (and better tooltimes as well, as I'm rediscovering), the application-specific mapping gave the area savings, and architecturally-aware pipelining is a HUGE win on the performance.

So just one data point, but taking advantages of layout, architectural mapping, and specific quirks (eg, the BlockRAMs are the right size to do both the AES S-box and ONE of the Galios multiplies of the mix-column operation. The register can be used independantly of the LUT under MOST conditions. SRL16s need an output register but are good for retiming chains, etc) are hugely critical for area & performance.

To my mind, the reason to select Brand A vs Brand X should come down to two major things:

Familiarity. I'm a Brand X guy, because I'm more familiar with the architecture (and Xilinx has given a lot of support to UC Berkeley over the years, which has created this familiarity).

NON-FPGA features: For the work I'm currently doing (working on NIDS-in-FPGA), the MGTs are an absolute essential for the network interfaces, and the processor may become useful in the very near-term future, so its very useful to have in place.

--
Nicholas C. Weaver.  to reply email to "nweaver" at the domain
icsi.berkeley.edu