Cheating the FPGA clock speed

R

Rob Gaddi 18 years ago

Hey all --

So I've got a design, the very vaguest outlines of which are beginning to gel. But one of the things that's becoming apparent is that it would benefit from real clock rates somewhere between the obscene and the unthinkable.

Throwing lots of money at the problem seems to get me to 500 MHz, yet more can get me up to 550 MHz, but I could get a lot of other things to run much more smoothly if I could get clock rates out into the 650 MHz ballpark. That's for BRAMS and multipliers/DSP slices, not just the flops.

So I get to thinking about how the clock rate specs get figured out, and how they have to accomodate the slow silicon at the maximum operating temperatures. And that thought leads around in circles for a while, and ultimately leads to the following appalling question:

Anyone know anything about using Peltier modules, refrigerant pumping systems, or the like, to cheat up the speed of an FPGA? Is it even feasible to try to get a 20-30% overclock just from the joys of lower temperatures? Or do I just suck it up and deal with the rated clock speeds?

Rob Gaddi, Highland Technology Email address is currently out of order

Vote

R

raghunandan85 18 years ago

Hi, You can try increasing the supply voltage too I guess. Be careful with sub-zero cooling, might cause condensation.

The problem is the speeds are not characterised at those voltages, temperatures. So its difficult for the tool to find what speed is achieved/can be achieved.

Raghu.

ps.

eds?

Vote

M

Mike Treseler 18 years ago

I would finish a proto design and run some sims before I tried to speed it up.

-- Mike Treseler

Vote

J

Jim Granville 18 years ago

That's not a huge increase, so might be chase-able, if you can tolerate the risks/effort. General comments:

a) Device specs are corner values, so you want to go to the 'other corner'. That means super-precise, and super-clean, supply designs, and agressive cooling. You will need to measure the Temperatures, and voltages in the system. Be able to control both. Probably include some test cells inside the fabric, that are designed to fail-first. Use those to tell you how close you are to total failure.

b) Suppliers bin-select their parts. Look at the speed grades and prices, and plot those on a curve. Some of that is marketing 'what can we get for it', but some is also 'how many do we get' - also if they DO bin, the cheapest grades are rather unlikely to have faster parts in them.

c) if there is some benefit to the vendor, you might be able to ask for 'best in class' devices. (if they have more than one fab, it may be a few % can squeeze from the 'better fab' )

d) agressive floor-planning, and speed testing will be needed, but do not overlook 'smarter algorithm' pathways - you can sometimes more than double the speed, by a change of attack.

e) or wait for a generation.... ( a new device may release during your development phase)

-jg

Vote

H

HT-Lab 18 years ago

You might want to talk to these guys, not sure if they have any real silicon,

formatting link

Hans

formatting link

Vote

G

Gabor 18 years ago

At least for Xilinx and Lattice you can tell the tools what your worst-case operating (die) temperature is if you don't want to use the max rated temperature. In ISE you'll find this under the constraints editor. This should give you a feel for how much extra speed you can get by dialing down the temperature.

Peltier coolers require substantial energy, especially if you are cooling a chip that uses a lot of power. An FPGA isn't an image sensor running at a few milliwatts. You may want to look into other refrigeration methods. In any case you would need some good way to extract all the heat from the back side of the Peltier cooler...

Regards, Gabor

Vote

A

austin 18 years ago

Rob,

How many devices do you need to run this fast?

I will assume V5, as that is the only one which could do this.

If it is one, this is probably not too hard: make the design as fast as you can (by the proper architecture, try to use DDR (use both edges of the clock), and then cool the device to keep it at room temperature.

You may also run it 5% Vccint high, but no more than that. The reason is that raising Vccint does not raise the pass-gate voltages, so raising Vccint any higher provides no increase in speed.

Our devices since V2P have actually become slower if too cold!

Then go through a dozen -3 (highest/fastest speed grade) devices from different lots until you find the fastest one.

Due to process variations, there is a lot of performance in the devices which is "wasted" because we like to ship all the parts which test "good!"

Along with the slow parts that just barely meet -3, there are at least another 5% to 10% which exceed -3 by one or two more speed grades (even faster).

If you have to do this for more than one device, then it becomes far too difficult to find, make work, and so on (can't go to production like this).

I bring this up, only because there is performance there, and if it is a one-off study, it is do-able.

Of course, since Xilinx won't get rich off one part, I and Xilinx can not endorse this, nor can we support this. This is just here for your educational pleasure.

Austin

Vote

G

Gabor 18 years ago

[snip]

[snip]>

Any explanation as to the physics of slowing down when colder? Does this make sense for the new process or is it still black magic?

I'm assuming the device also slows down when hotter?

Regards, Gabor

Vote

A

austin 18 years ago

Gabor,

Thanks for asking!

Yes, it also slows down when hotter.

Not modeled very well by foundry spice models, the issue appears to be related to the mid oxide pass gates, whose supply is from a band gap controlled reference from the 2.5V Vccaux, combined with models that didn't have all the 'wiggles' needed (to simple).

The nmos passgates all run at a slightly higher voltage than Vccint (so the interconnect does not have a VT drop from the nmos).

The nearly constant passgate voltage, and a varying temperature, and (constant) Vccint varying leads to some interesting behavior of the speed (which is largely due to the interconnect).

First time we saw it, we were really puzzled, but then we went back and measured the devices in the scribe lines, and further refined our models.

The "problem" with today's advanced technologies is that second, third, and fourth order effects are all beginning to pop up, which leads to needing even better models than we have enjoyed in the past.

Better models take more time, or more effort.

For example, it is well known that the RF models for a process node come out as much as one year after the foundry is making silicon at that node. One year to wait for models is pretty tough, and if you are making cell-phones (or FPGAs with MGTs) you pretty much have to wait for the models, or do test chips, and make them yourself (there is (ain't) no such thing as a free lunch...TANSTAAFL).

An often heard remark between IC Designers "at the bleeding edge" at lunch is "those damn models changed again" (so you have to go back and re-verify everything).

Austin

Vote

B

backhus 18 years ago

Hi Rob, surely higher clocks speed up everything, but since you are working at the physical limit, maybe it is also useful to rethink your problem.

You need a lot of operations done in a specified time slice. The higher your clock the more operations can be done, true. But there are also other methods to increase the number of operations within a defined time slice. How about massive parallel operation? Is it possible to rearrange your algorithms to make better use of the ressources? Can you implement additional operation elements working in parallel? May it be possible to use more than one FPGA?

How about your Datarates at the inputs and output. Are they also in the multi gigabit range? Or significantly lower? In the second case, have you considered using a DSP clocked at some GHz. (Combined with the FPGA maybe...)

And if the problem is more in the area of signal detection/generation than algorithmic. Have you ever thought about using multiple phase shifted clocks? Four 500 MHz clocks at 0-90-180-270 degrees give you a resulting 2GHz resolution. Needs some cunning design but can help a lot.

Have a nice synthesis Eilert

Rob Gaddi schrieb:

Vote

Cheating the FPGA clock speed

Join the Discussion

Didn't find your answer?