Virtex-4 FX20 PPC405 Startup Issue

Hello,

I am looking for some help with a particularily nasty problem I have run into,

Out of our 10 prototype Virtex-4-FX20 (CES2 stepping) boards, roughly half are exhibiting an issue with the PPC405 starting up out of reset. After powerup, the bit file is loaded, done goes high, current load kicks in, but the PPC never boots. Other logic on the chip is running.

When the device boots properly, there are no issues booting from BRAM, loading DDR-DRAM from flash, or executing from DRAM. Everthing is working good.

Using chipscope, I can see the data from address 0xfffffffc being returned on the PPC405 PLB-I-Master side of the PLB arbiter correctly. However, the second address put out is garbage (0x100600), resulting in a bus error. The boot code is held in a BRAM off of the PLB. During a successful boot, the second address is 0xffffc000 which is correct. The reset sequence and first PLB bus cycle look identical in both the failing/non-failing cases.

Observations:

  • Freeze spray (now known around here as 'FPGA programming spray') will without exception make this problem go away. (suggests a timing / power issue??)
  • Warm resets (through the EDK reset controller) have no effect. The only way to make this problem go away is to reload the device.
  • Reloading the device does not always work. Some boards will always boot fine on the second try, while others will only boot once cooled.
  • The emulator (tried both XMD and Greenhills probe) cannot talk to the processor when it is in this state.
  • Clocks, DCM locks, reset signals, debug/jtag signals, all look normal.
  • The PPC is in an unrecoverable state which is a little disturbing regardless of how it got there.

What else have I tried (none of these have made a difference):

  • clocking the PPC405 slower. Same clock as the PLB.
  • JTAG loading -vs- selectmap loading
  • Boot from the OCM bus instead of the PLB.
  • Removed all other logic from the design except the PPC and an OCM BRAM
  • Looked closely at the power supplies / grounding.
  • I have already successfully played 'Stump the Xilinx FAE/factory'.
  • Spent hours in Timing Analyzer looking at any unconstrained nets.
  • Looked closely at errata

What angles still left to explore

  • I am 95% convinced this is either the result of an external condition, or a chip defect.
  • So, I am working up a power-supply change to delay VCCO from VCCINT. I don't believe that is it, but I am running out of things to try.

...................

Has anyone ever seen an issue like this (V4, or 2VPro)? I have done many FPGA designs over the years (although this is our first PPC-based design) and have rarely been this stumped.....

Any and all advice is welcome. Email me or post here.

Thanks, Chris '<

Reply to
Chris
Loading thread data ...

Hello Chris,

what is the ppc clock? if it is over 200 MHz: did you watch the Xilinx answer database record # 21820?

Best regards Florian

Chris wrote:

Reply to
Florian

I have been running the PPC at the slow PLB clock (77 MHz) for now until I get the startup issue fixed.

However, I checked my EDK project and the parameter was not added. It is interesting I did not run into any problems with (DFS_MODE = Low) when running the PPC at 224 MHz. Thanks for pointing that out.

Regards,

- Chris

Reply to
Chris

I have seen the exact same problem on our Virtex-4 FX20 ES1 boards. It has consumed a lot of debugging time. Xilinx has not acknowledged a problem.

I have also replicated the problem on FX12 ES1 dev boards with the most basic project possible booting out of BRAM.

You've pretty much gone through all the steps including chilling the chip to subzero temps with a can of inverted dusting spray (which briefly solves the problem until it warms back up to room temp). I've also tried changing power sequencing and using insanely long external power on resets without luck.

I see a similar thing on the PLB bus except the garbage address that occurs after the first correct jump is either 0x00100800 or 0x00000800.

Chris wrote:

Reply to
wiggs

That's not as rare as users might hope. Quite a few devices have Reset lines that are better called 'ResetRequest', and where a hard power cycle is needed to recover from such states.....

If freeze _always_ fixes, then that is a chip margin issue, (tho that may be aggravated by external conditions)

Can you measure the chip temp (sense diode?), and get an appx temperature threshold ? ( if you warm the 'good' ones, do they then fail too ?)

Is this 'being pushed', in the die temp sense ?

Nudge of Vcc should also be similar to temp changes.

Have you tried different date codes ?

-jg

Reply to
Jim Granville

I have noted that the problem seems to vary with temperature but not completely depend on it.

It does not seem to be possible to eliminate occurrences of the problem on our worst samples no matter how much they are blasted with cold spray.

Likewise on our best boards were the problem rarely (or seemingly never occurs) under normal operating conditions, it will usually show at higher temps.

We did not connect the temp diodes on our boards but we attached one to the metal casing of the FPGA package during tests to get some rough idea of temperature. Some of our boards seem to have a temp threshold where the problem starts occurring above a certain temp. It's not a binary work/not work situation. Even at the high temps the board will still work from time to time, it just fails more frequently.

As indicated by our 'best' and 'worst' boards, there is no magic temperature number that exists across different chips, it seems to vary quite widely.

We have at least two date codes for FX20s. I've seen the problem on both. As I mentioned on my reply to Chris I've also seen the problem on an FX12 development board which I presume is a different die altogether.

I have tried bumping up Vcc a little. No notable change in the problem.

Reply to
wiggs

Our Virtex-4 FX20 10C-ES1 devices also exhibit this problem.

We used ChipScope to look at the PLB bus and like one of the other posts the address on the bus after things go awry is 0x????0800, the upper 16-bits can vary a little.

Our supplier has been fairly helpful since we also noticed the problem on Virtex-4 evaluation board we purchased from them.

Xilinx was dismissive until recently. It has been suggested that we set the C_APU_CONTROL parameter in the MHS file to 0b0000_0000_0000_0001. I've tried it on one bad board and the problem has not happened since. It would be interesting to see if others have the same result.

Reply to
rwightman

Please open a case with the Xilinx hotline if you encounter an intermittent problem while booting the PowerPC after initial power-on in Virtex-4 FX FPGAs. The hotline engineers will assist you to identify the problem and once isolated provide you with an appropriate design parameter.

- Peter

Reply to
Peter Ryser

After some investigation, Xilinx recommended the C_APU_CONTROL =

0b0000_0000_0000_0001 setting and the problem has since gone away on all of our boards. I have not seen any noticeable side effects in software. At least in the short term this seems like a reasonable solution.

- Chris

Reply to
Chris

Answer record #22179 is now on-line

formatting link

- Peter

Peter Ryser wrote:

Reply to
Peter Ryser

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.