Updated Stratix II Power Specs & Explanation

Paul Leventis · 2005-02-14T21:02:58+00:00

Hi,Today we released our updated power specs for Stratix II. Somehighlights of the updates found in the Stratix II Early Power EstimatorV2.1 tool:(1) Reduced static power by up to 47%. We've measured many units fromacross the product family, and have the data to tighten the speccompared to our previous conservative/estimated values. The amount ofchange varies from family member to family member, and is a function ofjunction temperature and whether typical or worst-case silicon isselected.(2) Static current on the VccPD rail now reflected (it is tiny)(3) There is no more in-rush Icc current. The previous currentreflected a result measured on early units from one family member plussome excessive guard-bands. The underlying cause was rectified and allStratix II devices now exhibit a monotonic ramp for Icc and no in-rush.(4) We previously reported around 100 mA of static power per used MRAMin the chip. This turns out to have been a measurement error and nowthere is no added static power.See details on the updates and where to get the EPE. Quartus 5.0 willreflect these updated specs when it is released in Q2.Paul LeventisAltera Corp.

M

Marc Randolph 21 years ago

Something similar crossed my mind when we first talked with our local Xilinx sales rep several months ago about how the S2 compares to the V4.

Like has seemingly occured with microprocessors, will there come a time when FPGA's are fast enough for all but a small number of of fringe applications? Except for the gamers, you don't hear people talking about needing a faster CPU anymore - they are "fast enough." Before the vendors jump all over me, I'm not saying that FPGA's have reached that point yet. And yes I realize that there is still plenty of innovation that they can do on their side. And each one of those responses would miss my point.

We push our FPGA's pretty hard where I work, and in the past, have always come up with a list of things that we want or need for the next generation device. But when Xilinx came around asking their thousand questions about what what they can improve on over V4 for the next gen device, our suggestion list was very short. I am most certainly not saying innovation is over when some engineers at a startup can't come up with really good ideas for next gen parts. I'm just saying that from where we sit, the FPGA's seem to be approaching "good enough." Not there yet, but approaching.

I'm sure we'll want a 40 Gbps SERDES (with CDR) on every I/O pin someday, and we'll want to run several levels of logic at 1 GHz or more. But the wish list is pretty short compared to what it used to be

- and I'm willing to wait a number of years for it to come true (where as in the past, there's usually been something that we needed immedately and had to "design around" the current architecture).

Anyone else out there see this? Anyone seeing something that a V4 or S2 won't do fairly well, that you think someone might want or need in the next year or three?

While I agree with Glen that the quibbling about power was highly annoying, especially when everyone knew they were dealing with pre-release numbers and products that the vendors know are not final, after putting all the information through the FUD filter, I did come away with a better understanding of the issues involved, helped somewhat by the (probably rushed) update of the S2 power numbers.

Have fun,

Marc

Vote

A

Austin Lesea 21 years ago

Glen,

Well, here is something useful: suppose I told you that with decreasing geometries, the models are getting both faster, leakier, AND slower, and less leaky?

This brings up an interesting question, what if the next product had two extra speed grades SLOWER that the slowest? Perhaps leakage grades as well?

Basically this is the implication of designing with the ever increasingly small geometries: some transistors are faster, but some will be slower, and the process control will be more difficult.

For all those who do not need the speed, one could offer lower cost parts, as well as offer four (or five) more speed grades at an increasing premium.

Sort of like, if you get lemons, make lemonade.

The issue right now is the sales force freaks out when they hear that the next generation is both FASTER, AND SLOWER (it can be both, as it turns out).

But I agree with you, that not everyone needs the fastest part. A survey of system clock speeds was quite revealing: big use at 33 MHz,

66 MHz, 100 MHz, 155 MHz, with a decreasing tail past 200 MHz. Funny thing, all these frequencies are "magic" and coincide with PCI, SONET/SDH, SDRAM, etc. No magic at all?

Aust> Aust> (big snip)

Vote

A

Austin Lesea 21 years ago

Vaugn,

Shell and pea game: no, you do not get the entire benefit of reduced C.

Also, not all layer dielectrics are Lo-K. For example, the clock tree is near the top, where regular dielectric is used, isn't it?

At least, we evaluated both with, and without Lo-K devices (from the same masks and fab), and were surprised to see only a 5% improvement.

Did you do the same experiment? We were surprised.

Turns out, there is a lot more in the equations that just C.

If it was just that simple, extracted simulations in spice would be unneeded.

Aust> Thanks to John for a thoughtful posting. I enjoy reading this newsgroup and

Vote

N

Nicholas Weaver 21 years ago

Yes, I do. Gigabit Copper has become this all purpose glue: a cheap way of connecting stuff together. Currently, it takes an external PHY or MAC/PHY: not a big deal on an expensive board with an expensive FPGA, but its a big deal on a cheap board.

I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T MAC/PHYs as hardcores, from one on the smallest part to perhaps as many as eight.

I see a world which needs a ton of high speed, low cost programmable Gb network devices (mostly security applications, but who knows what else?)

Never happen, but an interesting thought.

Nicholas C. Weaver. to reply email to "nweaver" at the domain icsi.berkeley.edu

Vote

J

Jim Granville 21 years ago

This chip was interesting, as it includes SATA PHY, at 1.5GHz

formatting link

so I am sure we will see the same in FPGAs. Longer range PHYs, probably are determined by power/voltage swings

-jg

Vote

B

Brian Drummond 21 years ago

I suspect that as hot-spots are tackled, new ones will appear. For example, multipliers used to take up a huge area.

I'm finding big multiplexers to be an issue, for a couple of reasons. Barrel shifters, and normalisation (which will become hugely important if the floating point synthesisable packages take off) for one, and the replacement of internal tri-states with mux logic for another.

They tend to be quite large, not very well structured (for floorplanning), and not particularly well pipelined, as they tend to synthesise to several levels of logic without using the slice FFs.

On a couple of recent designs I'd estimate 30 to 50% of the area has been multiplexers (and I didn't have the option of using multipliers as barrel shifters). MUXF5s help ... a little.

I believe something like a 4/8/16 to 1 MUX function 16(18?) bits in width (preferably also configurable as a barrel shifter) would be worthy of consideration as a next generation block function (or MegaFunction)

- Brian

Vote

M

Marc Randolph 21 years ago

local

or

in

important

the

as

worthy

MegaFunction)

Howdy Brian,

Now knowing your device size, I'm not sure how many muxes you have, but any way you think of it, 50% of an FPGA for muxes is quite a bit. But I think the V4 DSP block does most of what you've discussed (wide muxes, barrel shifters, and counters), doesn't it?

Have fun,

Marc

Vote

M

Marc Randolph 21 years ago

or

in

PHY

Yes, it is - and I agree that there are tons of applications just begging for gigabit ethernet connectivity (PVR/DVR's or HDTV's, not to mention almost all "normal" communication or networking equipment). But the analog circuitry required to support the five level (!) signaling that gigabit copper uses seems like a little much to ask of a multi-hundred MHz digital device with tens of thousands of gates toggling all at the same time. Even so, you gave me an idea... what if you could have the analog front end in a cheapish external device and use the internal DSP blocks to do the signal processing? Of course, to make it worthwhile, that cheapish device would need to be at least 3 or

4x lower cost than the $10/port gigabit copper phy's you can buy right now. As you said - unlikely that it will ever happen.

The MAC's are a completely different issue. They could do that easily right now, and in all devices. So I think you did hit on something there. There is no reason that every FPGA couldn't ship with 1 or 4 or

12 of them (or for a start, at least more than one family). When I first heard that the 4VFX20 devices were going to have 8 MGT's, I immedately started thinking of all the things I could do not only with those 8 MGT's, but with the 8 hard MAC's they would surely include. Only later did I discover they put only 2 MAC's in the FX20!?! The MAC's obviously aren't designed with networking/telecommunications equipment in mind or they would have included one for each MGT. Instead, they seem to have made it 2x the number of 405 processors. Great idea though!

Have fun,

Marc

Vote

V

Vaughn Betz 21 years ago

The entire benefit would be 19% speed and dynamic power reduction. As I said, we get about 2/3 of that maximum benefit, since not all C is metal C, but most is.

We use low-k to near the top of the metal stack. At the very top, where you're routing power and ground, you don't need (or even want it), since high capacitance on power and ground is beneficial (helps prevent ground bounce & vcc sag). The vast majority of the switching capacitance (clocks, routing, ALMs, MACs, etc.) is in metal surrounded by low-k.

We simulated everything with and without low-K, and got the ~13% improvement I previously mentioned. I am also surprised you got only 5%. That is certainly well below mainstream for the industry -- if everyone were seeing such small gains, I doubt the fabs and semiconductor equipment vendors would be pumping billions into developing low-k (and next generation extra-low-k) dielectrics. Sounds like you may have used low-k for only a few metal layers, so perhaps that explains your disappointing experience.

This is backwards. As metal capacitance has become the dominant capacitance, extracting layouts to obtain all the metal parasitics before running SPICE has become essential to getting accurate answers. Go back enough process generations and this was less true -- you could write up your transistor-level schematic in a SPICE deck, simulate it with no thought of metal, and you wouldn't be that far off for most circuits, since transistor parasitics dominated. Now that metal dominates, you have to extract layouts to get the metal C or you get bad answers.

Vaughn Betz Altera [v b e t z (at) altera.com]

Vote

A

Austin Lesea 21 years ago

Vaughn,

Well, you certainly have been fooled.

See below,

Aust>

I doubt it. The dielectric above the transistors is regular undoped glass (SiO2). K = 4.3. Then comes the lo-K after M1. M1 through M5 is all they can do as lo-K, if they do more, it sufffers major yield and reliability issues. Of maybe you haven't noticed the delamination yet?

Nope. You did not. If you did, you would discover that the layer above the transistors and below metal 1, as well as the upper layers for clocks, etc. leads to less than expected improvements. I am pretty sure your ICDES folks just scaled everything. It would be a major project to develop, and QC spice models for both processes, and I seriously doubt anyone would bother.

which they are.

I doubt the fabs and semiconductor equipment vendors would

The only folks making money on this are the equipment suppliers. No one I know asked for it. Yes, it can be a major benefit to ASIC, uP, and perhaps memories. But, it just isn't doing anything for us. Now, we will get lo-K for free, as they have the equipment and process now, butguess what? We still do not see more than a 5% improvement from V4 without lo-K to V4 with lo-K. Wow, two generations and two sets of side by side lo-K and regular experiments.

Ignorance I guess is bliss.

Sounds like you may have used low-k for only a few metal

Nope,as I described, the only layers alloed to be lo-K for lifetime delamination issues and quality are the ones above M1, and below M5. Anymore than that, and we have see problems with fab process qual (not on our parts, but their test structures).

I can see you really have no clue about where the wire models are going. How thick is the metal, how thick is the dielectric? How close are the wires? There is R there (and lots of it). There is C there, too. There is also side wall C (the sidewalls are regular FSG, or SiO2 -- no lo-K advantage).

Again, you go back and ask if they actually had foundry models for with, and without, and what the actual stack up was. One of the biggest overstatements we have seen recently is all of this nonsense about the superiority of lo-K.

Its nice, don't get me wrong, but don't tout it as a miracle if you have never proven it is. You don't know. We do.

Take the time to do it right, or at least study it right. Get an ICDES wire model expert to talk to you about where the lo-K is, and isn't.

Vote

J

Jim Granville 21 years ago

Carefull Austin, I think you have both agreed Low-K is better, but like the old Oscar Wilde joke, the debate centres on "how much".

Sounds rather like the FPGA speed claims themselves - maybe marketing could just put this under an 'Up to 19%' umbrella ?

-jg

Vote

P

Peter Alfke 21 years ago

I hope everybody here realizes that there is no trade-off between triple gate oxide and low-k dielectric. They reside on different "floors" of the vertical IC structure.

The availability of a third oxide thickness at the transistor level (ground floor) gives the designer the freedom to reduce leakage current in pass transistors (where it does not affect speed) and in configuration memory, where lower speed is actually desirable. We at Xilinx think it is a great tool to reduce leakage current without any performance loss, specially in FPGAs where certain (millions of) transistors would benefit from being slow.

Low K dielectric (at the upper floors) hasnothing to do with the transistors, since it is used only in the layers of interconnect well above the transistors. It is obviously desirable to lower parasitic capacitance, as long as it can be done with good yield and without loss of reliability. Different foundries have different approaches and different attitudes.

Thicker high-K dielectric in the gate oxide (ground floor) would actually be desirable, since it would reduce gate leakage current, but it does not seem to be a mature process yet ( I have been told. I'm not an expert).

We are all chasing the holy grail of high performance at low (or at least reasonable) static and dynamic power consumption.

Peter Alfke

Vote

V

Vaughn Betz 21 years ago

Austin,

Nice bafflegab.

I have the spec for the dielectric and conductor stack for the 90 nm process we're using in front of me. I wrote field solvers for my Master's degree and commercially before I saw the light and switched to FPGAs for my PhD. So I really don't need an "ICDES expert" to explain metal stacks or RC extraction to me.

The metal stack is dominated by low-K.
Lateral capacitance is reduced by low-K, since you use low-K between the wires on a layer. Since lateral capacitance dominates in deep submicron (e.g. 90 nm), without doing this, low-K would be fairly pointless.

Having "regular k" between metal 1 and the substrate still means even metal 1 gains most of the benefit of low-K, since sidewall (lateral) capacitance dominates, and you use low-K between the metal1 wires. Plus you reduce the (smaller) capacitance to metal 2.
Metal resistance does not impact power. You can prove this fairly simply mathematically.

Metal resistance impacts speed, although not that much in FPGAs since the wires are rebuffered so often. However, since delay = RC (lumped approximation), that pesky C is still in there and reducing it gives you a linear speedup on the distributed RC delay of the metal wires.
The simulations showing what we got from low-K vs. high-K were detailed, and agreed with measured data from the sample chips we ran (yes, we run chips on different variants of the process too).

Vaughn Betz Altera [v b e t z (at) altera.com]

Vote

V

Vaughn Betz 21 years ago

Peter,

Pass transistors are timing critical in FPGAs. Using a thicker oxide reduces Cox, and transistor drive strength is linearly proportional to Cox. Much like increasing Vt, you can control leakage, but there is a speed cost to be paid.

Otherwise I agree with everything you said though!

Vaughn Betz Altera [v b e t z (at) altera.com]

Vote

A

Austin Lesea 21 years ago

Vaughn,

I am sorry, but you have never run a spice simulation of midox pass transistor vs thin ox.

I would refrain from opening mounth and removing all doubt.

Aust> Peter,

Vote

A

Austin Lesea 21 years ago

Ahhhh,

So I am dealing with a physicist. They seem to be the only ones that think everything is so simple, and models are so perfect.

No wonder we can't seem to communicate.

I will start listening (and responding) when you design a few IC's. Then you will have some credibility (with me).

Til then, enjoy your fantasy world,

Aust> Austin,

Vote

F

Falk Brunner 21 years ago

"Austin Lesea" schrieb im Newsbeitrag news:cv54b2$ snipped-for-privacy@cliff.xsj.xilinx.com...

Another round in the eternal battle between theory and practive ;-))

What is PI?

Mathematican : "Its the Quotient between a circles's circumference and its diameter with the value 3.1415927blablabla" Physicist: "Its 3.1415927 +/- 0.000001" Engineer: "Something around three"

Regards Falk

Vote

P

Paul Leventis (at home) 21 years ago

I am sorry Austin, but how exactly is it that increasing oxide thickness does not decrease transistor speed? Increased tox = decreased beta = decrease Ids. And the Vt increases with tox too, unless you adjust the implant levels for those transistors (at the expensive of another mask and processing step). If there were truely no speed implications of using thicker oxide transistors, we'd all be using thick oxide transistors everywhere and bragging about are "Single Gate Oxide" technologies!.

There are places where slower transistors (be it longer gates, higher Vt, or thicker oxide) are more tolerable than others. For example, the configuration rams (no impact on speed). Are the pass gates one of those places? Maybe -- depends on speed vs. leakage goals and the exact result you get from your sim. Arguing that there is no speed loss and no complexity increase whatsoever though is silly.

Regards,

Paul Leventis Altera Corp.

Vote

P

Peter Alfke 21 years ago

Paul, Xilinx of course does not use thick oxide willy-nilly. As you would agree, there are many (millions of) transistors in the configuration latches where slowness is goodness (helps SEUs for example). And I am also convinced that thick oxide does not slow down pass transistors that are controlled by static configuration cells, while passing fast signals. These are circuits that do not exist in "normal" ICs, but are prevalent in FPGAs. Thus Xilinx can take advantage of it, to reduce leakage current. Altera has poo-poo'ed it, but that would never stop us.. :-)

The battle of: "I have a PhD, therefore I know better", vs "I have 23 years of experience in telecom" is getting a bit long in the tooth. I could throw in my "over 40 years of digital design experience" as if that would impress anyone. (Although it really does help with some perspective...) We will keep the community interested with additional stories about performance and power consumption. And I will keep fighting marketing BS wherever it comes from.

I might also hint at our next web-seminar about signal integrity. How a fairly clean on-chip signal can get corrupted when it reaches the pc-board, and what Xilinx has done to improve that situation. You can hear that on Tuesday March 1: "Signal Integrity and how it is affected by FPGA packaging". With real-life examples and screen-shots. Knocks your socks off ! Oscilloscopes have come a long way... Peter Alfke

Vote

P

Paul Leventis (at home) 21 years ago

Hi Peter,

I haven't heard any electrical reason expressed as to why this would be so. Besides, simple logic tells me that this cannot be the case -- if there was no speed impact to using thicker oxide transistors, you wouldn't bother with a "medium" oxide device and would instead go thick oxide in these particular circuits.

I agree 100% -- how does one's experience, position or such change the quality and content of the arguments they present? Besides, my lack of (figurative) gray hairs puts me at a distinct disadvantage in this arena!

Regards,

Paul Leventis Altera Corp.

Vote

Updated Stratix II Power Specs & Explanation

Join the Discussion

Didn't find your answer?