While I see that you found a solution by recompiling faster, I have a few "time-honored" solutions to this problem as well (Usually reserved for expensive, low volume systems that need to ship today (to get a paycheck next month)).
1) Check your input voltage for the power supply pins. Sometimes people will use a LPF on the power supplies, but the series resistance (probably not a precision resistor) tollerance may result in the power being within spec, but lower than normal. I like using an adjustable LDO to drive each FPGA power group (especially Vint). The other nice thing about that approach is that you can "tweak" up the power supply to the upper limit by piggybacking a parallel resistor on the ADJ pin. So, for instance instead of riding at 2.5V nominal you can make it 2.6V. The higher supply rail will help compensate for the temperature slow-down.
2) Use the static timing analyzer to look at the slowest paths. Then see if errors in these paths are a likely cause of the symptoms you are seeing. Usually they are and the recompile solution may work. Unfortunately in a lot of production systems recompiling results in a whole lot of requalification effort. In that event, trimming noted above is usually easier.
3) Get some air movement going. It doesn't take much air movement at all to make the junction temperature significantly lower. Touching it with your hand may not be a great test. If the device has a very higher thermal resistance, you won't feel anything, but it'll be cooking inside. (I saw a VAX catch fire this way many years ago ... and it was still booting).
Trevor
we are running in trouble with our curent design for a Xilinx Spartan 3 xc3s1500.
It does signal processing and it seems that sample got lost with increasing temperature. Immediately after power on all works well, some minutes later, if final temperatures is reach, some samples are missed. I hadn't a thermometer ready, but I can always touch the FPGA for a long time, it may be 50°C. It runs with a clock of 76.8 MHz, PAR states a maximum frequency of
78.777MHz, and logic utilization is about 60%.
One board works as expected and two other show the explained effect, the boards have the same layout but are made by different manufacturers. At least the not working are lead free.
Just now, we had a discusion to the influence of temperature to propagation delay. I don't believe that it influences clock lines and other logic resources in a (big) different way. Is It true or not?
I read the thread "Propagation delay sensitivity to temperature, voltage, and manufacturing", but the answers are very related to DCMs.
Tom