until recently I did live in good faith that all decent FPGAs do have bitstream integrity checks and do not start in case of configuration loading errors.
This seems not to be case at least for Xilinx Virtex2 FPGAs.
I do have a desing and FPGA evaluation system where I constantly see bitstreams that start but have erratic behaviour. This can only be explained that there have been errors during download but impact (JTAG download) does not report and error and FPGA starts as it would be OK. After power off and reconfigure the error is gone.
1) from Xilinx answers: if prog_b pin is being pulsed during JTAG download then the FPGA configuration sync is lost what yields to bullshit loaded into FPGA and FPGA starting with that bullshit with no errors being reported during configuration. My system has a button and pullup resistor on prog pin - nobody is pushing it during download.
2) Xilinx Virtex2 FPGA have a new feature called AutoCRC what is more reliable as the CRC used in older FPGAs. The normal CRC check (RCRC command and write to CRC register) are still being used unless its a debug bitstream! -- Good god, but why does impact generate bitstreams with CRC value fixed 0x5F57 for all Virtex2/p/s3 devices ?? the meaning of CRC is that is not constant but calculated? Ok, the AutoCRC is written, but the AutoCRC should only operate on frame data? how are other config writes protected if the normal CRC check seems to be bypassed ???
Antti PS 0x0000DEFC !!!
for those who do not know the meaning 0xDEFC its the DEFault Crc value written to CRC register when CRC check is disabled. When CRC check is enabled CRC is 0x5F57 but the meaning of that - sorry I can not decode! it must be a magical value that matches any good CRC value (a calculated value!)
PPS Xilinx: where is the algorithm for AutoCRC ???
this erratic behaviour only happens with known good working bitstream on some downloads. the whole system (1M gate system with MicroBlaze system) is working but soft core microcontroller sees some hard-wired registers return random data (not pre programmed constant). This bad register is consistent for one download attempt and persist after hardware reset also.
you have hardwired register that should be read as 0xAA always - but on some download attempts it reads lets 0xE1 every time you do hardware reset. next download is ok again.
its not funny to simulate Full 1M Gate with MicroBlaze ! and you can not simulate badly configured FPGA anyway, can you?
hm but the check clock and reset, hm, that is a good thing todo maybe, the system has 2 clock domains running from 2 different external clock inputs and 3 DCMs. So the reset of the system is not simple. and yes the register that returns bad data is in other clock domain the system SoC.
but the fixed checksum doesnt seem possible there are 2 checksum locations
1 AutoCRC after frame data this calculated and OK
2 normal CRC this is fixed to 5F57
no way the AutoCRC is correct CRC for previous data and also fixes the next CRC to have a constant value!!
I don't know what this means. Are you getting erratic behavior in 1 out of 100 JTAG downloads? Or 100% of JTAG downloads?
Resets do not reset everything. They do not, for example, re-initialize block RAM. If you are depending on the initial contents of a block RAM for proper operation, and your circuit occasionally stomps on block RAM shortly after start-up, your circuit may not work until you reconfigure.
As I said in my previous post, you haven't proved that configuration is the problem. And I'm not, repeat, NOT, suggesting that you somehow simulate the configuration process. But it would be interesting to know if there's a way resources like block RAMs could be corrupted shortly after you come out of reset, perhaps due to problems with interfaces between mutually asynchronous clock domains.
I can't rule out the possibility that you are occasionally loading a corrupted bitstream, but it seems very unlikely. Doctors have a saying: when you hear hoofbeats, think horses, not zebras. If I had a design that I didn't simulate, and configuration seemed to complete successfully, I'd start looking somewhere other than configuration for my problem.
yes it means that the 1M gate desing with 32K application code for Microblaze has not bein simulated. All the custome IP cores connected to Microblaze of course have been simulated.
I have a bitstream that starts always OK when loaded from configuration memory, and start with erratic behaviour 1 from 100 JTAG configuration attempts (even when JTAG configuration did not show any error during download). When the bitstream starts badly it behavies badly after reset also, only full new reconfiguration makes the system to working again. So I do assume it is possible that the CRC check is not sufficent in Virtex2 devices and that they actually do start also in case of bad download sometimes.
You suggested this erratic behaviour of bad starting when loading from JTAG could be found running simulations ?! Well I really cant understand that any simulation models could take into account the errors that happend during download. ?? Or what was it what I could possible find in simulation?
I'm coming a little late to this conversation, but perhaps this has not been considered. I sincerely doubt it is a configuration problem. Much more likely, you are not coming out of reset at the end of configuration cleanly. The global reset must be considered asynchronous to the clock. Most likely, you are occasionally getting a situation where one or more flip flops are seeing the end of the configuration reset a clock cycle before or after other flip flops in a critical area of your design. Simulation usually won't catch this, so you need to do a careful examination of the start up of your design. I can't tell you the number of designs I've seen that make this common mistake, even from FPGA board vendors with much experience that really should know better.
Check the state machines in your design. The resets for them should come from a
flip-flop in the design that feeds all the reset inputs to the state machine. You can't depend on global reset going away on all flip-flops during the same clock cycle.
Antti Lukats wrote:
--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.
If it is a V2, and you only experience problems when downloading from JTAG...Keep in mind that the V2 supports partial reconfiguration. As such, the JTAG bit-banger from IMPACT doesn't invoke the global clear when re-configuring, so BRAMs will not initialize, and FFs _may_ not. I think I saw a Xilinx solution record with information on how to invoke the global initialization through JTAG manually...the Chipscope tool does this currently. Or, you can manually short the PROG pin to ground.
By the way, this is documented in the V2 design guide.
If I read this right, you are saying that read-back does show the error, and that error persists on many read-backs until re-config ? That does sound like a config-write-error. Have you tried multiple devices (ideally with differing datecodes ?) If this persists across device/date code boundaries, I would say it shows a serious blind spot. In general, any device program includes a verify step, and on an FPGA devices skipping verify has probably become the norm, because of 'saving time' reasons. If the CRC is not sufficently reliable, then that would make config something of a lottery. [just maybe they do not CRC the whole bitstream ?]
Perhaps someone from Xilinx could clarify more what AutoCRC is, and does ?
Good question. Antti, when you say that "readback" is consistent, are you referring to the MicroBlaze's readback of that one register, or are you saying that you are seeing an error when you perform a bitstream readback?
didnt notice some more replies to my post, thanks!
well let me again explain the situation:
its Virtex2, it has Microblaze with 32k BRAM, I am using both impact and Chipscope to download the bitstreams. The bitstream is known good, but in some cases after download one hard coded register is read by microblaze like giving wrong readback. The readback is constant for given configuration attempt. And the wrong read value persists after any number of hardware reset. Only goes away after new reconfiguration. The wrong read value comes from an verilog wire (that has an assigned constant value). I still do not see how the clocking or reset problem could do that. If the bitstream is loaded again the problem disappears. If the same bitstream is loaded from configuration memory there is never a problem.
BRAMs are initialized, flip flops are initialized ok, or they are not relevant in the current problem. If the FPGA is not able to start with errors during actual configuration download, I would say this problem should never have occoured.
Ray - if you notice my plea to give information about Xilinx Auto-CRC has been left un-responded. Virtex 2 bitstream does not include normal CRC as it used be in spartan II/E. Its replaced with AutoCRC. But there is no information how it is calculated anywhere in any public documents!
Xilinx says that the old CRC was not good enough and did not catch all errors during configuration !! But I bet the new one is not much better!
Microblaze starts, i.e. DCM works, BRAMs init ok, etc... I press HW reset and RTL revsison registers (hard-wired) reads 23.27 as example not 1.21 as it is wired to return. this wrong readback 23.27 persists after any number of hardware reset (reset to microblaze and all registered logic). after reconfing the problem is away. Some other time the wrong readback maybe differently wrong but again it remains constant until reconfig.
And yes it looks like there are chances that V2 bitstream can be starting even if it had errors during download. And yes i would like xilinx to document AutoCRC function ;)
In a recent design we came across "configuration initialization problem" that sounds similar to what you are noticing. But in our case it is a Spartan-IIE device and the failure is incorrect initialization of the SRL.
In our design we used SRL16E to create a divide by 16 counter. Essentially a 16bit circular shift register loaded with 0x0001. We have a chain of these to a create a 250ms tick from a 66MHz free running clock. After the successful configuration we expect the 250ms tick to be free running. The 250ms tick works most of the time but it fails once in a while. We couldn't explain the failure, couldn't solve the problem. Fortunately we could work around the problem by replacing SRL based counters with FF based counters. Note that SRLs do not have any reset pin hence no reset dependancy.
The question I couldn't answer was if there was a corruption in bit stream, and FPGA CRC logic didn't catch it, why did failure only affected SRL INIT value? Why didn't it affect, say a SLICE configuration? Why didn't some other logic in FPGA mis-behave?
here are more details
- Target FPGA device is XC2S50E-6FT256C
- FPGA configuration mode is Master Serial (M2, M1, M0 == 0,0,0). The CCLK is driven by FPGA only.
------------------------------------------------------------------------------ In our design we have mutiple instantiations of SRL similar to what is shown above. We also use the UART macro from Xilinx listed in XAPP223. The macro also uses SRL to implement a divide by 16. Source clock to the SRL is from a free running osciallator (66MHz) present on the board.
I confirmed that code was correctly implemented by checking the init value of the SRL in fpga_editor
Experiments we carried out
Ex1 : Turn on the power. Ensure FPGA is configured (DONE=1, INIT# =
1). Check whether free running clock from SRL is running. If clock is running, power cycle else stop. Result : We saw failures where the SRL in our part of design didn't oscillate. We also saw instances where the SRL in UART macro didn't oscillate. Some times it took 30 tries sometimes 500 tries.
Ex2 : Turn on the power. Pull PROG# pin of FPGA low. Pull PROG# pin of FPGA high. Ensure FPGA is configured (DONE=1, INIT# = 1). Check whether free running clock from SRL is running. If clock is running, reconfigure the FPGA by toggling PROG#. Result : We saw failures where the SRL in our part of design didn't oscillate. We also saw instances where the SRL in UART macro didn't oscillate. Some times it took 30 tries sometimes 250 tries.
Ex3 : We replaced the SRL based "divide by 16" counters with FF based counters in our part of the design. The UART macro still contained SRL based divide by 16. Repeat "Ex2". Result : We saw failures only in the SRL macro of the UART. The divide by 16 counters implemented using FF never had any failures.
The problem is seen on multiple boards.
I'm glad I found someone seeing similar symptoms that we were struggling with for couple of weeks.