Altera introduces Cyclone III devices, ships 65nm

Paul,

The lemons in the basket were especially good, by the way.

The 36K BRAM block in V5 is built as four separate 9K blocks, so if the BRAM is arranged such that it can power down half, or 3/4 of the arrays, it will. This is another trick that gets used to limit the power, and burn those mA only when it is required.

In addition, the DSP48E blocks have local dedicated interconnect so that a filter can be made with no fabric interconnection, excepting the inputs, and outputs. This also saves a tremendous amount of power.

Regardless of the small details in the estimators, the overall result is very useful in comparing power between devices, and product offerings. I would only ask that one tries a number of different designs if trying to decide "best" performance.

My example was just trying to target ~80% of all resources, running at

100 MHz, near a Tj of 85C. As the ad goes "your actual results may vary."

Austin

Reply to
Austin Lesea
Loading thread data ...

Unfortunately, when it comes to comparing dynamic power between vendors, I don't think these tools are too useful. In order for such a comparison to be valid, one must assume that both companies invested similar engineering resources into the tools, both companies strive to faithfully respresent the typical design case, and that no one is trying to cheat in order to make their power look better. Perhaps the lack of an XPE FF power model and ignoring clock power are simple mistakes or poor engineering choices. However, given how long those "bugs" have been in the tool, I can't help but think there are marketing reasons behind them.

Regards,

Paul

Reply to
Paul Leventis

Paul,

I would not be the one to throw mud at the other (on power estimation)!

The question of who has the more "honesty" in marketing is one that you or I are unlikely to get any sympathy for in this forum.

All I will say, is that since I run the verification and characterization shop, and I see the actual power numbers, the time spent on the subject with all 5 process corners for all three oxide transistors, is immense.

And, until you actually have all this data, the typical is really meaningless. And as such, our tendency is to choose the process corner fast 1-sigma as "typical" initially.

It has been clear (to me, by using your tools for S2, using actual measurements on the 'Battle-Board') that your policy is different.

That is OK. You had your disclaimer clearly visible. That is being honest.

I also agree that dynamic power does not vary hardly at all with process or temperature. I will say that estimating dynamic power is also very difficult, as the customer not only has to have a vector file from his simulations, but that vector file has to be faithful to their application, and cover actual intended operating situations. The number of customers with the patience to run extensive simulations and check to see how "real" they are is not a large number. Most will find the initial estimate from the spreadsheet adequate for their first guess at power supply and heatsink. After their prototype pcb is built, there is the opportunity to fine tune the power supplies and cooling (if needed).

I wish engineers were more disciplined, and spent more time on simulation, but FPGA devices are marketed as "fast to market" and many sometimes take that as "not much risk to skip a few steps..."

One more thing: prunes in a fruit basket? I'd like to know what on-line shop you used...

Gotta run,

Austin

Reply to
Austin Lesea

If there is any mud you'd like to throw at the EPE, please feel free. Our goal is to make the best tool we can, that is as accurate as possible. Any specific criticism that helps us get closer to that goal is appreciated.

If my comments result in a corrected FF model and the addition of a clock model to the XPE, this will help improve the fairness of comparisons between XPE and EPE. And it will help Xilinx customers avoid bad power surprises. Seems like a win-win situation to me.

Our typical static power value represents the typical device we ship. That the median of a distribution didn't match the couple sample points you have should not surprise you. I'm sure you don't need a lesson in statistics from me.

There are three separable sources of error here: (1) Transition densities and signal probabilities (simulation, hand entered, etc.) (2) Design implementation details (synthesis, placement, routing) (3) Quality of the underlying models

(1) is pretty much in the hands of the users. All we (FPGA vendors) can do is give them many ways to get the toggle rates into our tools. In the end, if the user grossly mispredicts the toggling of their design, there's nothing we can do for them.

(2) is somewhere in-between. Quartus/ISE should know everything to make this a non-factor. But in an EPE/XPE tool, its a combination of user entry (things like I/O standards, RAM modes, etc) and reasonable guesses on our part. This is where things like representing reasonable routing power is important -- the user has no idea what their design will use, so we must make the best guess we can for them.

(3) is completely under our control. Given perfect entry for (1) and (2), our goal is to minimize the error between our estimates and silicon measurements. If supplied with perfect information a tool still can't estimate power correctly, then what hope does a user have?

Yes, it is unfortunate. Most vectors users have lying around are targeted at corner-case coverage, and do not represent typical steady- state operation.

Paul Leventis Altera Corp.

Reply to
Paul Leventis

Thanks for taking the time, Paul. I've gotten down to the gate level for many things in my past but I guess I never had the need to look deep enough into SRAMs and I certainly didn't get it in my college work. Though I know about the half-voltage precharge for bit lines, I though that was for non-SRAM technologies and never saw a need to delve into it. Async SRAMs needed both read and write *strobes* didn't they? My mindset was along the lines of the cute little distributed RAMs in the other FPGAs - nice multiplexed outputs - no bit lines to worry about and instant output change for an address change.

I managed to find a couple nice presentations googling for "6 transistor cell" to get the SRAM details. I'm just sad it took me this long to know the innards!

- John_H

Reply to
John_H

Hi John,

Sounds familiar, but then again, we're well past my knowledge of RAMs too :-)

Distributed RAM architectures are interesting. A LUT is after all just an asynchronous ROM, so you get your read-path for free when you want to use a LUT as a memory. The problem is the write path. Specifically, where do you get your write data and address from (since you're using the normal LUT inputs as the read address)? How much hardware do you need to add to the LUT to make the bits dynamically writeable? If you want to have registered access, where do you get the registers from?

Stratix III introduces the capability to convert some LABs (collections of LEs) into simple dual-port (1R/1W) 10x64 or 20x32 bit RAMs. Half the LABs (the "MLABs") have this capability. Why do we convert an entire LAB? The main reason is to ammortize the overhead of some of the extra circuitry and write address logic over more LEs. Plus it is rare for people to need many independent narrow (x1, etc) RAMs, so in practice converting an entire LAB isn't a big deal. The extra inputs on the Stratix III ALM (8 inputs) meant we didn't need to add more routing area to get the extra RAM signals into the block, which was kind of nice.

One downside of all distributed RAM architectures is that when you assemble larger (logcial) RAMs out of smaller (physical) RAMs, you are using the programmable routing and other circuitry to do so. This can increase the power per bit when compared to a dedicated RAM block.

Regards,

Paul Leventis Altera Corp.

Reply to
Paul Leventis

So if we could probe inside a recent FPGA, we would have a real chance of seeing a noticeable speed difference across the die?

Any idea what the percentage would be? I suppose it could be checked by building ring oscillators at various locations.

Reply to
Tim

Tim,

This is something one has to do in order to know just how "bad" the timing can get. Obviously, one has to specify the worst possible timing in order to have a design that works.

If one logic block (let us be generic here) is slow, and it is slower than what you thought it was going to be when you made your speeds file, then that chip might fail to meet timing on that path, and you get data errors.

How you go about finding the worst of the worst on a die is something that we have to do, so that we can be sure we are OK when we ship the part to the customer.

A ring oscillator is a very useful thing to do this, as measuring frequencies is often easier than measuring delays. There are other ways to do this, also.

Whatever, we have to do this in such a way to meet the quality objectives, and the cost of testing objectives at the same time. This is yet another reason why Easypath(tm) devices can be much lower in cost: we only need to test the paths you use to meet timing, not every possible path. It is also a reason why ASICs (structured, or otherwise) might have very poor yields all of the sudden (known as a 'yield crash' in the business)! If your ASIC has a yield crash, you may only be affected because your supply of chips disappears, but if you are paying for the wafers, then you will be doubly punished.

There are papers on this subject (search IEEE, etc.) as well as papers yet to be published on the subject.

This is one more reason why using a FPGA device takes a lot of the work out of what you would have to do otherwise with an ASIC: we do the hard work so you don't have to. We also remove substantial risk (yield, performance-timing closure, latchup, single effect upsets, etc.).

Every one of the risks above is documented fact for some structured ASIC suppliers. That means each one has already happened to at least one customer! This is one of the reasons why the structured ASIC business has more companies that have tried it and left, than those that are still struggling to make a go of it. Don't forget that Xilinx, too, was once a 'structured ASIC supplier' (HardWire(tm) devices). Been there, done that, learned our lesson.

And yes, as things have gotten smaller, the variation has reared its ugly head. Learning to live with the variation, or prevent it in the structures that you care about is no easy thing to do. Identically drawn devices, next to each other, can vary by +/- 20% in some cases. Try using devices like that to match something at all! The good news is that we IC designers "have ways" to meet our objectives.

Austin

Reply to
Austin Lesea

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.