FPGA vs ASIC

Xilinx fast carry logic path for v5 is described in this document

formatting link
page 193. As you can see it is not a multi-bit carry lookahead or anything similarly complicated. It's a hardwired implementation of a ripple carry logic which can be duplicated using standard cell full adders relatively easily.

Reply to
mk
Loading thread data ...

document

formatting link
193. As you

Thankyou, I have no need to learn the details of the virtex5 carry logic at the moment. The point I'm on is that in the fpga technologies I know the carry chain path has a much faster gate delay compared to the normal gate delay. This means you could do 8 to 16 bit adding in a pipeline running nearly at max technology speed for FF-gate-FF, maybe you need more time multiplexing the data before or after the adder stage than the time you need for adding. In ASIC I've only seen gates with the carry chain having a gate delay compareable to a gate delay, this means, that a 16 bit ripple adder won't be able to run anywhere near FF-gate-FF but needs something like FF-17xgate-FF. In that case you will likely have the adder stage dominating your pipeline frequency. You could only speed up a ripple carry adder by placing it tight together, but this has to be done at any timing critical path anyway.

bye Thomas

Reply to
Thomas Stanka

There are also different adder implementations at the synthesis tool level. For example Design compiler with proper libraries supports following adders: ripple-carry, carry-look-ahead, delay-optimized flexible parallel-prefix, brent-kung, conditional-sum and ripple-carry-select.

--Kim

Reply to
Kim Enkovaara

document

formatting link
193. As you

Then I'm not sure what we're discussing here but I'll try one more time.

That's because "normal gate delay" is so slow because of the programmable gates and routing. Normally when one is designing custom adders, a carry ripple adder is the slowest and smallest adder against which other more sophisticated carry select, carry skip, carry lookahead etc. are judged. One can buy more speed by paying with area and/or power by using one of these architectures. The fact that in an FPGA a dedicated carry ripple adder is the fastest just shows the inefficiency of the fabric. But one gets programmability with that inefficiency so the compromise usually works out. Actually what the FPGAs has should be named "dedicated/hard-wired carry ripple logic & routing" as there is not much "fast" about it. What would've been fast is if they added some carry look ahead logic.

Reply to
mk

Within a CLB, there certainly is carry look-ahead. It is abstracted out in the user's guides as an implementation detail that is not visible to the user. Be assured however, that there is a carry look-ahead going on in the physical hardware.

Reply to
Ray Andraka

Reply to
Peter Alfke

Peter, If I remembering correctly there is a TTL IC which has a 4 bit CLA in it. It would fit into the CLB nicely :-)

Reply to
mk

Do you think of the 74181 (=9341 in Fairchild parlance)? Nostalgia from 1970... It's a 4-bit ALU, but that means 22 signal pins in a 24-pin package:

4+4 operand inputs and 4 result outputs,5 mode controls (one of them to change between logic and arithmetic), carry in, carry out, carry generate, and carry propagate, plus an A=B output thrown in for free. Hard to put into a CLB, unless you define the mode by configuration. Peter
Reply to
Peter Alfke

That reminds me of the time I was in one of two row boats racing to shore. I thought I could "help" the race by jumping out and pushing the boat with my swimming. Boy did we suddenly slow down!

A 4-bit CLA in a CLB (wait - is there a CLK? no, no...) is marginally helpful in only the most extreme cases. Very long adders would still need a "generate" signal from each segment in a multi-segment adder (even the 74181 is a 4-bit ALU) when the individual sum is all-1s to go along with the carry from the carry chain. If you wait for the result to decode the generate, you need to get on and off the carry chain and through two levels of logic to detect a 16-bit "generate" or double your counters with A+B and A+B+1 results to come up with a C and G at the same time. What's this gaining? The carry from your 4-bit CLA needs 2 levels of logic to generate the "lookahead" from the 4 segments. A 64 bit counter would need 4 levels of logic plus routing *or* twice the number of adders with 2 levels of logic on top of routing. This is a quick way to make things go very slow.

If you want to accelerate small adders in FPGAs, you can't do it with FPGA logic. The carry chains are already very low propagation though there might be an opportunity to get on and off the carry chains more quickly with focused silicon development, perhaps compromising other performance aspects of the chip to achieve that improved on/off adder delay.

If you want to accelerate very large adders, there are methods that can provide better results than a CLA. You know about carry select, carry skip, etc, you should know that for small adders there's no help in the FPGA.

Don't believe me? Take a splash. Watch the boat slow down. Synthesis is cheap! Or was the smiley face showing an attempt at humor that I just can't grasp?

If you're suggesting that adding dedicated CLA functionality to the FPGA fabric, think of what it takes to produce the generate with the carry and aggregate the signals into the CLA structure. Do you think it could possibly be worth it for 99% of the adders in user designs?

- John_H

Reply to
John_H

Actually I was thinking of 74182. The resemblance between it and page

193 is quite interesting, no? 4 pairs of inputs in addition to carry from previous block. The outputs need changing a little though.
Reply to
mk

The XC4000 carry logic was pretty well documented, unless they implemented it completely different than it was documented.

I would have called it a form of carry select logic using special properties of pass transistors to minimize delay.

The internal details of the current devices are not as well documented, but still the internal carry should be faster than CLB based carry lookahead for most reasonable length adders.

-- glen

Reply to
glen herrmannsfeldt

The internal carry chain structure had a wholesale change when either Virtex or VirtexII was introduced (I don't recall which now).

Reply to
Ray Andraka

The only thing that really matters is the best combination of logic, circuitry, and transistor technology that achieves the smallest incremental carry delay per bit. And everybody can easily do a static timing analysis to calculate that incremental delay. I think it is about 30 ps per bit.

Peter Alfke, Xilinx

Reply to
Peter Alfke

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.