FPGA as heater

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 2:34 AM

I understand but my point was that regulating temperature to control speed has been done. It's not a strange idea at all.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 3:37 AM

Prop delays get slower.

High temperature is an unfortunate fact of life some times. I'm after constant temperature, to minimize delay variations as ambient temp and logic power dissipations change.

All our critical outputs are registered in the i/o cells. Xilinx tools report almost a 3:1 delay range from clock to outputs, over the full range of process, power supply, and temperature. Apparently the tools assume the max specified Vcc and temperature spreads for the part and don't let us tease out anything, or restrict the analysis to any narrower ranges.

Our output data-valid window is predicted by the tools to be very narrow relative to the clock period. We figure that controlling the temperature (and maybe controlling Vcc-core vs temperature) will open up the timing window. The final analysis will have to be experimental.

We can't crank in a constant delay to fix anything; the problem is the predicted variation in delay.

That's the idea, keep the FPGA core near the max naturally-expected temperature, heat it up as needed, and that will reduce actual timing variations to below the worst-case predicted by the tools.

I expect that the tools are grossly pessimistic. I sure hope so.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 3:44 AM

Even if it wasn't especially linear, the proportionality is based on degrees Kelvin. So the non-linearity would not be terribly pronounced.

That was part of the reason for the inflate-gate thing a couple of years ago. I remember that between the pressure being relative rather than absolute and the temperature being Celsius or Fahrenheit rather than Kevin, the people here took some time to figure out that the reported pressures were easily explained by the difference in temperature between the locker rooms and the playing field.

--

Rick C

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 4:07 AM

The nature of designing synchronous logic is that you want to know the worst case delay so you can design to a constant period clock cycle. So the worst case is the design criteria. The timing analysis tools are naturally "pessimistic" in that sense. But that is intended so that the design process is a matter of getting all timing paths to meet the required timing rather than trying to compare delays on this path to delays on that path which would be a nightmare.

When you need better timing on the I/Os, as you have done, the signals can be clocked in the IOB FFs which give the lowest variation in timing as well as the shortest delays from clock input to signal output. Typically I/O timing also needs to be designed for worst case as well because the need is to meet setup timing while hold timing is typically guaranteed by the spec on the I/Os. But if you are not doing synchronous design this may not be optimal. If you are trying to get a specific timing of an output edge, you may have to reclock the signals through discrete logic.

--

Rick C

- C
- colin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 7:54 AM

Our biggest box takes about a kilowatt, which includes 70W for the fans. We build enough of them, which run 24/7, to work out the total cost of owners hip and running the box a little bit hotter reduces reliability a bit but s aves enough electricity to make it worthwhile.

Colin

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 3:58 PM

Foil-sticky thermocouple on the top of the chip. It was an Altera Cyclone 3, clocked internally at 250 MHz.

formatting link

The ring oscillator was divided internally before we counted it, by 16 as I recall.

Newer chips tend to have an actual, fairly accurate, die temp sensor, which opens up complex schemes to control die temp, or measure it and tweak Vccint, or something.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 4:10 PM

That is exactly what John is talking about, except the heater will be on the FPGA itself.

--

Rick C

- L
- lasselangwadtchristensen
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 6:29 PM

Den onsdag den 12. april 2017 kl. 05.37.07 UTC+2 skrev John Larkin:

that is basically what the IDELAY/ODELAY blocks are for, you instantiate an IDELAYCTRL and feed it a ~200MHz clock and it uses that a reference to reduce the effects of process, voltage, and temperature on the iodelay

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 7:37 PM

k, possibly along with a copper trace feedback line.

I still think the IODELAY could help you. The output goes through an adjus table IODELAY, then you route the output back in through a pin, adjust the input IODELAY to figure out where the incoming edge is, and then use a feed back loop to keep the output delay constant. It's a technique used for des kewing DRAM data. I think the main clock would also have to be deskewed wi th a BUFG so you have a good reference for the input. Or, if you character ized the delay-vs-temp in the lab, you could run in open-loop mode by adjus ting the IODELAY tap based on the temperature you read.

Yes, the tools are definitely pessimistic. They're only useful for worst-c ase. I'm pretty sure you can put in the max temperature when doing PAR, so you could isolate the effects of just that, but it will still probably be worse variation than in reality.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 8:20 PM

My FPGA guy says that the ZYNQ does not have adjustable delay after the i/o block flops. We can vary drive strength in four steps, and we may be able to do something with that.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 8:27 PM

That's also not adjustable in real time though.

I believe what the others are talking about is a real time adjustable delay that is built into the clocking module. I don't know about the Zynq, but Xilinx has what they call a delay locked loop which sounds exactly like what you need. I believe it works by syncing the output signal to the clock signal. There will be some signal path in the feedback loop which will still cause timing variation with temperature and I suppose voltage, but the variation in process can be compensated.

--

Rick C

- G
- Gabor
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 9:16 PM

In the 7-series what you want is the MMCM, which has the ability to adjust the output phase in steps of 1/56 of the VCO period. This adjustment can be applied to a subset of the MMCM outputs, so you can for example vary the outgoing clock phase while keeping the data phase constant with respect to the clock driving the MMCM.

On the other hand, the whole point of a source synchronous interface is to just need low skew between outputs - not low skew between the input clock and the outputs. Typically just placing the outputs in the IOB and using the same clock resource is good enough. Skew between outputs is much lower than the variance in output delay.

--
Gabor

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 9:37 PM

Yeah, well, it's not like we really know the true and full problem. We just know he doesn't like the timing range reported by the tools.

--

Rick C

- L
- lasselangwadtchristensen
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 10:09 PM

Den onsdag den 12. april 2017 kl. 22.20.19 UTC+2 skrev John Larkin:

lock, possibly along with a copper trace feedback line.

justable IODELAY, then you route the output back in through a pin, adjust t he input IODELAY to figure out where the incoming edge is, and then use a f eedback loop to keep the output delay constant. It's a technique used for deskewing DRAM data. I think the main clock would also have to be deskewed with a BUFG so you have a good reference for the input. Or, if you charac terized the delay-vs-temp in the lab, you could run in open-loop mode by ad justing the IODELAY tap based on the temperature you read.

t-case. I'm pretty sure you can put in the max temperature when doing PAR, so you could isolate the effects of just that, but it will still probably be worse variation than in reality.

you are right the 7010 and 7020 only have high range IO so no odelay

are you just trying to keep a fixed alignment between clock and data output ?

you can do tricks with DDR output flops, data out with a DDR with both inpu ts as data, clock out with a DDR with 0,1 as input

-Lasse

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 10:22 PM

Hmm. I've used a real-time-adjustable ODELAY block, but that wasn't in a Zynq.

If you can add more hardware to the board, you could re-register the data in some external 74LS flops.

You could use unregistered outputs and make your own delay line with a carry chain, which you can create with behavioral code.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 11:21 PM

We are exactly trying to drive external flops, some 1 ns CMOS parts. They are clocked by the same clock that is going into the ZYNQ, and the FPGA needs to set up their D inputs reliably. We can't use a PLL or DLL inside the FPGA.

So the problem is that the Xilinx tools are reporting a huge (almost

3:1) spread in possible prop delay from our applied clock to the iob outputs. The tools apparently assume the max process+temperature+power supply limits, without letting us constrain these, and without assigning any specific blame.

I think that has even higher uncertainty, probably more than a full clock period, so we couldn't reliably load those external flops.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Apr 12, 2017 11:41 PM

The way you have constrained the design I think you will need to design your own chip. I would say you need to find a way to relax one of your many constraints. Not using the PLL/DLL is a real killer. That would be a good one to fix.

I haven't use the Xilinx tools in a long time, but I seem to recall there was a way to work with a single temperature. It may have been the hot number or the cold number, but not an arbitrary value in between. But that may have been the post layout simulation timing. Simulation is not a great way to verify timing in general, but it could be made to work for your case. I'd say get a Xilinx FAE involved.

--

Rick C

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Apr 13, 2017 10:22 PM

Like Lasse said above, you can adjust the output delay with a half-cycle re solution using ODDRs. This sounds good enough for your application. I use d that exact method once for a DRAM (single-data-rate) interface. (I think the training method was to write data to an unused location in DRAM with v arious phase relationships, read it back, and see which writes were success ful.) Your issue sounds a lot like the same issues people have with DRAM. I don't think you'll see a 3:1 variation in reality.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Apr 14, 2017 4:12 AM

I can declare the differential-input clock polarity either way, which would shift things 3.5 ns (out of a 7 ns clock.) But the guaranteed data-valid window is less than 2 ns.

I sure hope so.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- L
- lasselangwadtchristensen
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Apr 14, 2017 12:10 PM

Den fredag den 14. april 2017 kl. 06.12.52 UTC+2 skrev John Larkin:

the point of using DDR was not to shift the clock but to keep the clock and data aligned

"regenerating" the clock with a DDR, means the clock and data gets treated the same and both have the same path DDR-IOB so they should track

getting the output clock aligned with the input clock (if needed) might be possible using the "zero-delay-buffer" mode of the MMCM