async clk input, clock glitches

values

Hi Brian,

the story goes even more fascinating (or frustrating)...

first about Actel - their Logic Cell is NOT LUT based, so the LUT input transition glitches can not happen (no LUT!) but actel cell has NO DEDICATED flip-flop, and flip flops are configured out of logic, and have severe restrictions on configuration and routing, FF with enable and clear are made out 2 cells if CLR signal is not driven by global clock sometimes 4 cells are used to make 1 single flip flop.

the use of global clocks in actel, sometimes its not "order of magnitude" difference but simple OK vs FAIL 100% issue

case: 64 bit shift register (no enable just shift), clocked with CLEAN clock at 4mhz

work 100% when FPGA is almost empty (no matter what signal drivers shift reg clock) fails 100% when FPGA is >90% and shift clock is ruouted with local connects

PROVEN. this is also explanaibale, and actel has special appnote for this (clock skew handling) == now to my clocking issue

after taking a break I tried to eliminate things from the problem, so I stopped shifting data into the SPI shift register. this however made no difference, and when i looked again it was also explanaible as my test counter value was decoded in external AR MCU but the bytes holding the counter value did never get back into the FPGA!

so to summarize again:

1) ARM9 SW has counter, that is used to generate 8 byte encrypted token 2) those 8 bytes are going into FPGA over A1 parallel interface they are written into dual port BRAM 3) those 8 bytes are read by AVR over SPI interface 4) AVR decodes those bytes, it also decodes the counter from step 1, NO ERROR ever here 5) AVR writes 3 bytes to FPGA SPI shift register 6) AVR strobes to load LFSR, and set one time flag that prevents further LFSR loads 7) ARM9 sends packets in loop (encrypted counter)

now, the counter value NEVER exist inside FPGA the data passed via FPGA is not decoded there the data passed via FPGA is not used by the logic responsible for the load/enable of the LFSR

still at some certain counter value, the LFSR does get corrupted!! those values are repeatable, they are not random, that is each time the ARM9 counter is restarted then at exactly same counter values the LFSR is corrupted. those values are however different for different PCB+bitstream combinations

8) AVR puts FPGA into main mode 9) ARM9 sends any packets any number of hours LFSR corruption no longer happens! no single errror seen

errors happen at [7] at const numbers of counter error NEVER happen at [9] while streaming random data for hours..

so there is no different transitions on the load input of the LFSR at all, still there is repeatable dependancy on the data stream that only passes through FPGA

eeee... [8] what happens there?

well after main mode the AVR no longer collects data from FPGA and no longer decrypts it. but that should no have any influence as the decrypt result wasnt passing into the FPGA anyway.

there is one more difference, after [8] the LFSR is enabled for 8 clocks lonfer for each packet from A1 interface

but the LFSR enable/load signals do not share any logic with any of the data lines !!

Ok, when found out that the counter data is not passing into the load input at all, i did take a walk and coffee break, and did write a list of options that can be tested, list with 6 items, so will work them out one by one

my bet is that I need add delay on the LFSR enable signal, there is no explanation why it should be used as it is already a FF in the same clock domain as the LFSR clock, but all other items on my list of possible problems are even less likely (or harder to test)

Antti

Reply to
Antti
Loading thread data ...

guaranteed

values

more,

status: ALL ISSUES SOLVED

I wonder how many guesses what the problem really was? :)

I check one item from my list, what made no difference. luckuly i did not continue with my lists, as the real problem was not there anyway.

sure, pure human error, there WAS dependancy on the data being pushed into FPGA there should not have been but it was, because some signal had wrong name.

the same design error was already in intitial xilinx prototype a year ago but was never seen as it only happened 1:256 at startup

while converting to Actel, i first reduced decode from 8 to 4 bits getting 1:16 errors and later to 3 bit getting 1:8 errors

when I made my test with counter and got errors at some counter value, i did not print out the data bytes (8 byte encrypted) as I assumed them to be random but if i would have printed them out anyway, i would have seen at that any time when error happend 3 bits from 8 byte had constant value.

that 3 bit value caused the LFSR enable to be of wrong duration.

but, fixing it wasnt still so easy. After seeing the problem i fixed the it. then did make BIT file, and 2 different PCB boards did not work at all :(

!?

after that I managed to get the micro-SD card with the test program corrupted so badly that if insert that SD into card reader on Vista then Vista explorer HANGS (need power off !!) on XP it just cant access the card.

ok, I get another sd card, put the test program onto it,

and I see again the same errors ?? after really really fixing the VHDL

I did rerun synplify and designer many times, to be sure they really generate new bit file, still the same error was there!

guessed already? sure the Actel Flashprogramming tool had remembered wrong location for the bit file, so I used the wrong one.

when the fixed bit file really got into the FPGA, then it worked as it was supposed

Antti hope the thread wasnt so entirely boring :)

Reply to
Antti

So we were all looking in the wrong place...

it happens!

Not at all ... Glad you got it sorted.

- Brian

Reply to
Brian Drummond

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.