Altera introduces Cyclone III devices, ships 65nm

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hello:

Today, Altera announced the Cyclone III device family.

Highlights:

Industry's first 65nm low cost FPGA
Shipping now (yes, really)
Up to 120,000 logic elements, Up to 4 Mbits RAM, 288 18*18 multipliers
Aggressive family plan with 8 packages, 8 devices, 3 speed grades, 3
temp grades, leaded/lead free packaging
Example of power spec: 120K Logic element device static power spec is
~170mW at 85C typical
Device samples, low cost dev kits, documentation and software are all
available today on www.altera.com

Link to main page:
http://www.altera.com/products/devices/cyclone3/cy3-index.jsp
Handbook:
http://www.altera.com/literature/lit-cyc3.jsp

Just a friendly notice to the designer community on this usenet.

Best regards,

Luanne Schirrmeister
Director, Low cost products
Altera


Re: Altera introduces Cyclone III devices, ships 65nm

Quoted text here. Click to load it

and the MAX III is due when ?


Re: Altera introduces Cyclone III devices, ships 65nm
Quoted text here. Click to load it





Even prototyp friendly PQFP-240 packages are listed...

--
Uwe Bonnes                 snipped-for-privacy@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
We've slightly trimmed the long signature. Click to see the full one.
Re: Altera introduces Cyclone III devices, ships 65nm

Quoted text here. Click to load it

I can't see any comparison specs for NIOS, on Cyclone II / Cyclone III ?

Oh, wait, I found a comment "available in 4Q07" for Nios Kit ?

Clicking on the starter kit also says docs : "(available mid-April)"

Seems "Shipping now (yes, really)" - applies only to selective aspects
of the rollout, not all..

-jg


Re: Altera introduces Cyclone III devices, ships 65nm
Quoted text here. Click to load it

Hi jg:

Sorry for being vague - clarification:  The Cyclone III starter kit
can run Nios II.  You can order it today on the Altera E-store,
digikey or any Altera distributor ($199).  The "Cyclone III Nios Kit"
is a beefier version of the same board that adds embedded systems
functionality to round out the platform.  It's available later, as you
noted.

With respect to comparing Cyclone III to Cyclone II, high level looks
like:

Up to 50% lower power, the higher the density, the greater the power
benefit (3C5 is marginal vs. 2C5, 3C120 beats any other FPGA in its
density class on power specs by a pretty wide margin)
CII has 70KLEs, CIII has 120KLEs
CII has 1.1 Mbits of RAM, CIII has 3.9Mbits (huge increase across all
densities)
CIII has much more sophisticated PLLs (dynamic reconfig, cascadable, 5
outputs per)
CII supports 333 Mbps DDR2, CIII supports 400 Mbps DDR2 (fastest to
fastest comparison)
CIII may be very slightly faster than CII
We support both families in their entirety in our free Quartus II web
edition software

On device availability, I will let the shipments speak for
themselves.  I hope this answers your questions.  This is a usenet for
engineers so this is my last post.  Altera engineering can take it
from here.

Lu Schirrmeister
Altera



Re: Altera introduces Cyclone III devices, ships 65nm
Stock: 0

Availability: ?

New definition of "available today," I suppose.

Austin

Re: Altera introduces Cyclone III devices, ships 65nm
Engineering samples available now.  Production I believe August/September

Quoted text here. Click to load it



Re: Altera introduces Cyclone III devices, ships 65nm

Quoted text here. Click to load it

I found :
EP3C25F324C8NES  324-FBGA      Cyclone99% III      FPGAs
Shows 5 avail @ $49.14
- other packages ( and sizes) seem to still be comming....

but. to be fair, let's go to the Xilinx on-line store, and compare
performances.
Oops, oh right, there IS no Xilinx on line store anymore : just a
shell.....

So, we jump thru the link to Avnet, on a Spartan XC3S50A in a couple
of packages,
and oops, again, No stock, "Call", and worse, no price either!!

At least Altera tell users what the Cyclone III's will cost, when they
are in stock.

-jg


Re: Altera introduces Cyclone III devices, ships 65nm
Quoted text here. Click to load it

Austin, I'm saddened.
With all the recent discussion of sincere problems getting Xilinx parts
- no availability at the Xilinx online store and troubles getting s
straight answer from distributors who don't carry inventory - you really
have no place poking the competition in the ribs about their apparent
(lack of) availability.  They're an FPGA vendor after all!

Re: Altera introduces Cyclone III devices, 'ships' 65nm
John,

Well, I just went on-line after reading the thread, clicked on their
button, and it said '0 available'.

I was very surprised.

It seems that some links are not working, and some tables do not supply
the correct information.

OK?

That said, you are correct:  we still do not have a suitable
'onsey-twosey' means of supplying the chips.  A million in one month, no
problem.

One? Two?

At least we were able to get the latest parts stocked on the
distributors shelves recently.

I suppose if the only process that is available (now) from TSMC is the
"low power" 65nm one, that it makes good business sense to make a low
cost FPGA in that process (if you are given lemons, make lemonade).
Seems that they will have to wait for the "high performance" 65nm
process to be developed and debugged, where when we asked two fabs for
65nm triple oxide, we got it right away.

Since I am in the "high performance group" (Virtex), I am enjoying our 9
month lead on them in 65nm.

I wish them luck.

Austin

Re: Altera introduces Cyclone III devices, 'ships' 65nm
Austin,

According to the rattling in the newsgroup, information could not be had
from the distributor websites for ANY quanitity from Xilinx for new parts.
This was my disappointment: that you thought the Altera web link information
of "0 available" meant something more than Xilinx's "0 available."

Congrats on your 65 nm lead.  I love that Xilinx is pushing the technology.
Unfortunately I don't have the luxury of using it since our office
marketplace can't afford the high costs associated with performance FPGAs.
If I'm going to touch 65 nm anytime soon, it'll be Altera.  If I'm going for
PCIe anytime soon, it's probably Lattice to get the embedded phy in the low
cost part.  I have yet to swim outside the Xilinx pool in my current role
but I've certainly stuck my toe in others' waters.

I liked the Spartan-3 route of pushing 90 nm technology first but I'm a
little disappointed that the non-performance product line seems a little
stale with incremental changes into S3L, S3E, S3A, S3AN....  I can
understand the difficult experience getting the S3 up to speed for yield,
power, and such on the new process so I'm not disturbed that Xilinx hasn't
hit 65 nm yet for the cost-sensitive products but with the V5 success, I'm a
little surprised that I haven't heard anything concrete yet from Xilinx as
to the 65 nm low-cost route.

I was happy to see lower max power numbers for the S3E family the Friday
before the Altera "low power 65 nm process" announcement but their numbers
still put the triple oxide S3X family numbers to shame.  Damned good lemons,
I guess!

Yay Xilinx!
Yay Altera!
Whatever.
I enjoy the competition because I get to enjoy the fruits of competitive
technology, citrus or otherwise.

I'm looking forward to the next Xilinx advancements, as well.

- John_H


Quoted text here. Click to load it



Re: Altera introduces Cyclone III devices, 'ships' 65nm
John,

I just compared their typical power with our typical power in Virtex 5
by using spreadsheets.

Very impressive.

I am surprised that their 65nm part has a Vccint of 1.2 volts, however.
 I might be concerned about gate oxide life.  I notice their process is
specified at 10 years, max, at 85 degrees Tj.

It is still far better than Intel or AMD, which are at 3 to 5 years at 65nm.

That we are at >20 years at 85C at 65nm has surprised (and delighted)
our customers.

1/2 watt for a design in C3, vs 1 watt for same design in V5.  I also
get ~ .85W for that design in Spartan 3E.

19000 LE's, 200 slow 4 mA IOs, all the RAM, 2 PLLs or DCMs, similar DSP
12.5% duty cycle, 100 MHz.  3E doesn't have enough RAM, but then the RAM
 adds a very tiny value to power.

The only problem I have is that once you see how much 65nm varies due to
process (on the same die, let alone the same wafer), 'typical' ends up
being pretty meaningless.

For example, the "maximum" value for Spartan 3E goes from .85W to about
1 watt.  At least that is a guaranteed value for the worst case power
one could get.

Unless they choose to bin so they don't ship the leaky and high static
power parts, they will have to be honest with how much power it might
actually have to use, not just the typical value.

Still, this was our typical vs their typical, and it was half for C3 vs
V5, which is not a big surprise because C3 is the low power process, and
V5 has elements of the highest performance 65nm process (lowest Vt,
thinnest oxide), as well as medium power elements.

Generally, the V5 has a "typical worst case at X degrees C" value for
static power, and I really do not know how the C3 is specified for this
value.

They are honest and say that all that data is presently preliminary, and
being characterized.

Austin

Re: Altera introduces Cyclone III devices, 'ships' 65nm
It is important not to muddle things here by mixing dynamic power,
static power, and I/O power.  All of these are important, but it's
much cleaner to compare things separately.  I will also focus on
Spartan-3E, since Virtex-5 is in a different weight class and
(justifiably) burns a ton of power in comparison.

*** STATIC POWER ***
The below table compares the two families at 85C junction temperature,
for typical static power.  As you point out, we do not have publicly
available worst-case power specs.  These are available upon request
(by customers ;-)) and will be released generally when the family
enters production status in a few months.  In the meantime, I assure
you our typical -> worst-case multiplier is in the same ballpark as
that of Spartan-3E and should be a non-factor in this comparison.

           Typical     # of    Block RAM   18x18
           Static @   4-LUTs    (Kbs)      Mults
              85C     or FFs
XC3S100E    0.037W     1920       72        4
XC3S500E    0.098W     9312      360       20
XC3S1600E   0.249W    29504      648       36

EP3C10      0.048W    10320      414       23
EP3C25      0.086W    24624      594       66
EP3C80      0.135W    81264     2745      244
EP3C120     0.172W   119088     3888      288

At similar density (1600E vs. 3C25), that's 1/3 the static power.  Or
looking at the 1600E vs. 3C120, that's 30% lower static power for 4x
the LUTs & FFs, 6x the RAM, and 8x the multipliers.  Not bad.

*** DYNAMIC POWER ***
Dynamic power is much harder to compare.  The Altera Early Power
Estimator and the Xilinx XPower Estimator both cannot model "worst-
case" dynamic power.  They model the typical case.  Dynamic power does
not vary much with process (die-to-die variation), however it varies
considerably with the exact implementation of the design and the
vectors applied.  All we can hope is a user does their best to guess
the vectors (which turns into toggle rate), and we must do our best to
represent the typical design.

For example, we must pick *one* number that represents the power of a
LUT as a function of the % toggle of its output.  However, the power
will vary by the LUT mask (function implemented), as well as the
amount of routing (= capacitance) hooked up to the LUT output.

I know how the Altera EPE is made.  We take 100+ designs, compile them
in Quartus (which is correlated to silicon/simulations), and figure
out the average amount of routing that dangles on the output of LUT.
That's what we put in the EPE.  Some designs will be worse than this,
some will be better, but it is after all an estimate.  For example,
the routing power ascribed to a FF in the EPE represents roughly a C4
+ R4 wire.

Not to pick on the XPE, but it has some dubious results.  If I enter
10000 FFs @ 300 Mhz, I get 0.189W.  But this doesn't change with the
fanout -- I would think there should be more routing power?  If I
change toggle rate from 12.5% to 25%, the power stays at 0.189W.
Strange.  And if I reduce the number of FFs to 1, power drops to 0 --
is there no clock in the chip?

That said, let's look at the estimates for a variety of logic types
according to the EPE vs. XPE:

                                        S-3E    C-III
10000 LUTs, 10% Toggle, 200 Mhz         0.229W  0.080W
10000 FFs, 10% Toggle, 200 Mhz          0.126W  0.111W    <--
Suspicious XPE FF power
50/100 Simple Dual-Port 16-Bit RAMs,
      200 Mhz, 50% enable & R/W rates   0.264W  0.129W   <-- 50 BRAMs,
100 M9Ks
100 18x18 200 Mhz 50%/"medium" toggle,
      registered multipliers            1.848W  0.346W

Looking good for CIII.  Of course, that shouldn't be too surprising --
65 nm = smaller transistors & wires = lower capacitance.  And Cyclone
II already had considerably lower power than Spartan-3E.

*** I/O POWER ***
I/O power is largely a function of the drive strength of the buffer
and the load / termination network connected to it.  Pin capacitance
differences aside, I/O power should be similar between the families,
so I won't bother crunching the numbers.



Quoted text here. Click to load it

Careful -- you're only guaranteeing the static power portion of the
power estimate.  It increases for the XC3S1600E from ~0.249W to 0.386W
@ 85C Tj.  The dynamic power portion of the estimate is not a worst-
case bound, as discussed above.  I wouldn't want people getting the
wrong idea!


Regards,

Paul Leventis
Altera Corp.


Re: Altera introduces Cyclone III devices, 'ships' 65nm
Paul,

Very good post.

I agree with everything (thanks for the fruit basket) except the loading
comment.

The interconnect design is such that all paths are buffered, so the only
way to take more loading into account is to move up to the more complete
tool, which uses the placement, and the resources, and the transition file.

After all, it is not called an "estimator" because it is exact.

Good luck with the roll-out of all the C3 family.

Austin

Re: Altera introduces Cyclone III devices, 'ships' 65nm
Quoted text here. Click to load it

I'm not talking about pure electrical loading.  From a topological
perspective, if you have a FF that fans out to 1 destination vs. a FF
that fans out to 5 destinations, I would expect the latter FF to have
more wiring than the former (all other things being equal).  As you
increase the number of sinks on a net, the wirelength of that net
increases (sub-linearly).  Wirelength = power, even with buffers -- if
you need 4 buffers + 4 wires you will consume more routing power than
if you need 1 buffer + 1 wire.  In addition, as you point out, the
loading on each wire will impact the power burned.

If you're using the Quartus II PowerPlay Power Analyzer, then all this
stuff is taken into account.  The exact pieces of metal used in each
route are factored into the power analysis -- in fact, Quartus even
looks at the shape of the waveform at each buffer to derive short-
circuit current, in addition to computing the current due to
capacitive charging.

The EPE tool obviously doesn't know the P&R of the design, so it must
guess the # of wires or total power of the routing.  We break our
power estimate into two pieces -- block power and routing power -- to
make clear how much of our estimate is high-confidence (block power)
vs. lower-confidence (routing power).

Returning to the XPE tool, the FF power *should* be including
associated routing, since it's not accounted for anywhere else in the
tool.  However, the power of the FF is low, and doesn't change with
the fanout of the FF.  This just doesn't make sense.

Regards,

Paul Leventis


Re: Altera introduces Cyclone III devices, 'ships' 65nm
Forgot one thing...

Quoted text here. Click to load it

Actually, in my experience RAM is one of the largest contributors to
dynamic power.  Why is that?  As HDL designers, none of us really
think of shutting off things we don't care about.  We just don't do
anything with those signals.  RAMs are usually a good example of
this.  Most people don't turn off their clock and/or read enable
signal when they don't need the value of a RAM in a given cycle.  The
result is that the RAM toggles "100%" of the time -- even if you are
reading the same address on consecutive cycles.  Internally, the RAM
must precharge and discharge all the (differential) bit lines on each
access, resulting in a large dynamic power draw.

The PowerPlay Power Optimization feature of Quartus II helps mitigate
this by automatically disabling RAMs when it can.  There's a good
paper from Transactions on CAD on some of the techniques we use
(http://www.ecs.umass.edu/ece/tessier/tcad-rampower.pdf ).  I've seen
some pretty neat results on customer designs.

Regards,

Paul Leventis
Altera Corp.


Re: Altera introduces Cyclone III devices, 'ships' 65nm
Paul,

Perhaps we have different architectures for RAM blocks?

Austin

Re: Altera introduces Cyclone III devices, 'ships' 65nm
Quoted text here. Click to load it

We've run our RAM power characterization patterns various Xilinx and
Altera chips.  While the absolute results vary somewhat with the
organization (width x depth) of the implemented RAMs, both of our
devices behave similarly with RAM data pattern, enables, etc.  Which
makes sense, since at the end of the day, a RAM is a RAM -- if you
read it, you burn power.  If you stop reading it, you burn less power.

Customers who have many RAMs in their design clocked at high
frequencies and don't bother shutting them off at all will burn more
power than they need to, regardless of their vendor.

- Paul


Re: Altera introduces Cyclone III devices, 'ships' 65nm
Quoted text here. Click to load it


To help users like me understand what's burning when a RAM with an active
ENA but no change in address or data is left alone clock after clock, what
dynamic power is there?

You mentioned "precharging" which - to me - suggests DRAM as opposed to
SRAM.  The SRAM cells still have the bit lines, still have the decodes,
still have the same contents.  If the address buffers don't change, register
contents don't change, ram contents don't change, where's the dynamic power?

I appreciate the insights,
- John_H




Re: Altera introduces Cyclone III devices, 'ships' 65nm
Hi John,

I'm no RAM designer, but I'll give it a shot...

Quoted text here. Click to load it

A common way that SRAMs are designed is to take an MxN array of SRAM
cells (for example, standard 6-T cell).  The address is decoded into a
word line signal, one per row of the SRAM -- when that signal is high,
the SRAM cell outputs are connected to the bit and bit-bar lines (one
for the +ve side of the cell, one for the -ve).  So you have N*2 bit
lines running through the SRAM.

The way a read happens is you first pre-charge the bit lines to some
reference value (Vref).  Then, you strobe the word line for the cells
of interest.  The bit and bit_bar will be pulled in opposite
directions by the cell.  A sense amp listens to the difference on a
bit line pair, and once some threshold is passed, it decides it must
be reading a '1' or a '0'.  What happens after this is not important
-- you could disable the word line and pull the bit lines back to
Vref, or let them continue pulling apart, etc.

Why do you pre-charge?  One reason I can think of is that otherwise
your read could be destructive.  In a simple SRAM, the transistors you
use to read the SRAM cell are the same ones used to write to the
cell.  So if your bit lines are at 1/0, and you open up the access
transistors, you might ending writing 1 into the cell instead of
reading the 0 that was contained in it.  Another reason is for power.
If you let the bitlines swing only a little bit apart, register the
value, and then push them back to Vref, you don't swing the voltage on
the bit lines as much, and thus reduce power.  This is also important
for speed -- you don't have to wait for the long, relatively highly
loaded bit lines to swing fully before figure out what value is stored
in the memory.

What does all this mean for power?  Well, if you execute a read in the
RAM, of the same location with the same value in it, you are always
pre-charging to Vref and then swinging the signals to the value you
want to read.  So the *core* of the RAM burns constant power.

Of course, there is a lot of other circuitry in the RAM.  The address
decoder, for example, won't burn any additional power if you haven't
changed the address.  The output registers and output logic of the RAM
will not burn any power if the value read from the RAM doesn't
change.  Etc.  All of these effects are modeled in Quartus for our
90nm and 65nm families.  The EPE also makes a good guess at this
(which is why there are the variety of enable %s you need to provide).

In our RAMs, there are a number of clocks and enable signals that go
into the RAM.  Some of these control what data is registered into the
input data & address registers when, what is registered into the
output registers & when, and whether the core of the RAM is clocked.
By intelligently hooking up these signals, you can ensure that only
the bits of the RAM are operating when you need them to, which can
reduce your power significantly.

Things get funkier when you start thinking about FPGA configurable
RAMs.  You have a variety of modes for the RAM -- single port, dual
port, x1, x9, x18, etc.  If you are using a block that is natively
18x512, but only using it in x8 mode, do burn the read power of x18
mode or is the RAM smart enough to read only part of the array?  If
you're only using half the depth, is the RAM array segmented in some
way to avoid burning some of the power?  We have a variety of
techniques (circuit, architecture, and software) we employ to optimize
speed, area and power in the presence of this configurability.

Hope this helps,

Paul Leventis
Altera Corp.


Site Timeline