I compared the V4FX40 speed files in 8.2 and 10.1 using speedprint utility. There are a few parameters related to BRAM read, where they added second pair of numbers. Otherwise the timing is identical. Since I wasn't sure how and when this second pair of numbers gets used by the tools I ported the design to 10.1 and rebuilt it. This however hasn't fixed my problem, which is the design works up to 60 C and fails after that. This same design used to be very robust in FX20. The timing constraints are met with a margin and I've checked all the unconstrained paths. None of them seem to be relevant. The power seems fine as well. The core voltage is 1.185V, which is a little low, but well within the spec. If I keep the card under good airflow, so that the die temperature never rises above 52 C, it can work for days without any problem...
The design is centered around a PPC subsystem, which talks to another card through an async serial interface using UART Lite. When the problem happens, I stop receiving messages from the PPC. At this point the debugger doesn't seem to be able to stop processor properly, so I can't really debug it through software. The clocks in the PPC part of the design aren't fast: the PPC runs at 166 MHz, and the buses run at half of that... I believe the rest of the design, while might be broken as well, doesn't matter for the purpose of this discussion, as the PPC subsystem shouldn't be dying regardless of it.
I will appreciate any ideas on what else to check...
/Mikhail