async clk input, clock glitches

A

Antti 18 years ago

Hi

FPGA has

1) 50mhz system clock from ext oscillator
2) 4Mhz clk that is async to the 50mhz

problem, the 4MHz clk input sees double clk pulse, error rate approximate 1 to 10.000.000 unfortunatly the 4mhz clock needs to be used inside without phase delay, so oversampling and filtering with 50mhz is not an option, unless using very clever no delay glitch surpression filter

external small R/C circuit on 5mhz doesnt change the error rate much, ah currently the 4mhz is clocked 1 time with 50mhz, this seemed to give better results as using the 4mhz clock directly

any ideas how to really clean the 4mhz clock? or any thumb guess what is the likeliness to see double clk edges when sampling 4mhz with async 50mhz? could the "error rate" of such sampling be that 1:10M what I am seeing?

I assume the 4 mhz clock is rather good, it coming from an ASIC and has total wire lenght from asic to FPGA maybe 20 mm (but over PCB edge connector). I did kinda think its hard to belive that the clock edge is so slow or noisy that 50mhz sampling could ever see double/wrong edges but guess i am wrong

it doesnt seem to be cross talk either, as there arent much IOs toggling at all

hm it looks like in rare cases the error is also one clock pulse missing!

:) any good suggestions are welcome, how to troubleshoot the issue

unfortunatly the FPGA is actel so can use any on-chip logic analyzer core, and the chip is rather full also, some internal signal could be routed out to external logic analyzer though if badly needed, but so far i am trying to fix the issue by thinking, and error-retry...

Antti

Vote

I

Icky Thwacket 18 years ago

If you are sure the $MHz tr and tf are within the device clock spec (as is the voltage) then try putting the 4MHz clock into an onboard PLL configured for zero delay and narrow bandwidth, then use the PLL output for internal clocking. (Being a low life Altera user, don't know if this is available in Xilinx parts)

Vote

F

Frank Buss 18 years ago

How much delay is allowed? I assume you have already latched the clock once with every 50Mhz edge, to avoid meta stability problems. To avoid double clock pulses, you could feed a shift register with this latched signal and compare for "0111" and "1000" to detect edges more clean, maybe with an additional holdoff after detection to avoid detecting spikes.

Frank Buss, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de

Vote

B

Brian Drummond 18 years ago

however that doesn't stop you building analyzers (clocked on 50MHz) to diagnose the problem

series-term (22-50R) at source may be the best bet ... but with 20mm trace length, that's unlikely to be the problem.

Also establishes that some sort of resynch is acceptable.

Metastability : about once in recorded human history :-) Other causes : e.g. crosstalk during the linear window : much higher.

Build a trivial analyzer clocked at 50MHz. count "high time" and "low time" in 50MHz cycles; log the max and min "high time" counts (update the log after each rising edge); ditto low time.

Good results would be 6 cycles min, 7 cycles max (+/-1 from asynchronicity)

Classic edge bounce will show min high (or low) time of 1 cycle; glitches away from edges (e.g. from major crosstalk) may show e.g. min high 1, min low 3 cycles. Missing pulses will show large "max" times.

etc

My guess would be min times of 1, i.e. double-clocking edges, from (small amplitude) crosstalk which coincides with a slow 4MHz edge.

Oh . And does this occur on one board, or on several identical ones?

- Brian

Vote

S

Symon 18 years ago

Hi Antti,

I guess your problem is a slow edge rate on your 4MHz clock. Does this fix it?

process(clk_50M) begin if rising_edge(clk_50M) then four_meg_d

Vote

J

job 18 years ago

I am almost sure your 4Mhz frequency is clean on your board. But verify!

If your 4MHZ is stable then the trouble is coming inside your FPGA.

Before all, verify you are using a Global buffer for the 50Mhz.

Then you have to register all inputs with your 50MHz and to register the 4Mhz input with 50MHz too (f4mhz_reg). Then verify after Place & Route, your are using IOB registers (registers in the PADs)! The use of IOB registers is very important. When this is done and verified, return to your code and add a new Flip- Flop (50Mhz) to the registered 4Mhz input (f4mhz_reg2), and detect the rising edge of the 4Mhz by

f4mhz_rise_pulse

Vote

K

KJ 18 years ago

Not quite sure what you mean by a 'double clk pulse' but I'm assuming that you're feeding the 4MHz clock into 'something' in the FPGA that is clocked by 50 MHz. If that's the case, then what you need to verify in the final technology map that the 4MHz signal comes into exactly one flip flop and the output of that exactly one flip flop is what you use to clock anything else...and to mitigate metastability a bit more, that 'anything else' logic might consist of again exactly one flop, the output of which is fed into the real logic (i.e. you're constucting a two flop synchronizer). You'll also want to add synthesis attributes to any of these synchronizing flops to insure that the logic doesn't get opotomized in a way that ends up replicating the flops. In any case, verify the final routed result brings

4MHz into exactly one flop (and if adding a second sync flop that it too is the only load on the flop that captures the 4MHz) 4MHz -----> Flop -----> Flop -----> Logic that uses the '4MHz' input Pin

Since you haven't stated just how you're using the 4MHz clock inside the FPGA, you should probably clarify that, but a failure rate every 0.2 seconds (10 M of the 50MHz clock) then it's quite apparent that one of the most likely causes of the failure is one of the following:

- Simple timing (whch will be fixed by what I outline in the previous paragraph).

- Signal quality 1. Measured at the FPGA, are the rising and falling edges of the 50 MHz monotonic? 2. Measured at the FPGA, the 4 MHz clock doesn't have to be absolutely monotonic since it doesn't appear to be used to sample anything, but if it dips and comes back up and the dip is low enough to appear to be a logic low than the 50 MHz could (and eventually will) sample it at precisely that bad point.

- Could be power as well, not dipping out of spec at the FPGA are you?

Might want to elaborate on the reason for the 'without phase delay' requirement, but assuming that to be the case then a different solution that would minimize the phase delay would be to feed the 4MHz into an onchip PLL (if you have one) to create a 48 MHz and use that instead of the 50 MHz. That way, the two clocks would maintain a fairly accurate phase relationship to one another thus avoiding violation of setup/hold time windows.

If the reason for 'without phase delay' requirement is because of other FPGA inputs that are synchronized to the 4MHz, then another solution might simply be a dual clock fifo to move those inputs from the 4MHz to the 50 MHz clock domain.

This sounds like you are using the 4MHz to drive logic...big mistake (see first paragraph for the solution).

Unless you've scoped the 4MHz clock, why do you think it's not 'clean'?

Violating timing, inadequate power at the point of use and signal quality....when it comes right down to it those are the ONLY reasons. In the end, that's what you'll find here as well.

Sure, it simply depends on precisely what the setup/hold timing window of the actual part is. If you have freeze spray, a hot air gun and a simple way to quickly get your error rate measurement then try hitting your FPGA with the hot, measure your error rate (repeat with the cold) and see if it has a temperature dependency. If it clearly does, then you have a timing problem (see first paragraph for the solution), if it doesn't or is not clearly temp dependent, then you likely have a power problem. Based on the bits of info you've provided, I'm leaning towards the timing. This is simply science experiment stuff and is not needed to engineer the solution, for that you need static timing analysis, proper clock domain crossing design technique and proper power supply distribution.

No need to guess or speculate, unless you don't have a scope to simply measure.

Timing problems produce those symptoms as well.

Hopefully you'll find the above useful....to reiterate though, I can dang near guarantee the fundamental reason for the failure will be

- Violating setup/hold time in the 50 MHz clock

- Signal quality (50 MHz or 4 MHz)

- Power

What you need to do is measurement or analysis to either eliminate causes or turn up the design error.

You shouldn't need to bring out anything, static timing analysis and a scope will get you to the root cause.

Good luck.

Kevin Jennings

Vote

P

Peter Alfke 18 years ago

Antti, click on

formatting link

xl34_54.pdf

which shows two different ways to avoid the effect of double-edges on a clock. I wrote that many years ago, and published it in Xilinx XCell magazine #34 Peter Alfke

Vote

A

Antti 18 years ago

Hi all and thanks for all suggestions! some additional info

failing circuit

ASIC outpu t> 15mm trace > connector > 5mm trace > 27 ohm > 3mm trace FPGA input >

now, this 2 bit counter sees

double clock from asic in about 1:10M pulses
missing clock from asic in about 1:100m pulses

the asic clock is know to be perfect many other devices can receive it and have 0 error rate (have not seen error ever!) the 50mhz clock signal quality, well it doesn matter, as whatever could be wrong, it could not explain the double and missing pulse counts ?

using PLL on 4mhz is not an option as it is not free running clock but byte strobe with 4mhz pulses

so what is failing is really simple circuit! it also looks like when double pulses are seen the FPGA is not changing any of its output so its no SSO noise

I could understand power supply noise to cause double pulses, but how to explain the missing pulses?

I dont have scope here now, but i have tried to troubleshoot the clock problem before and have looked the signals with scope without seeing anything helpful to get to the problem, i will do it again if I dont get it working this weekend

the timing analyze with actel FPGA is something so:so, I have seen a shift register clocked at 4mhz working 100% when FPGA utilization below 90% and failing 100% when FPGA utilization over 90%, without any problem reported by the timing tools or post place simulation. I wasnt belive my eyes when i did see that, but so it was. Later i found some actel appnote about methods of dealing with such cases. I hoped that actel tools take of such situations but they do not.

so I have little hope that some more detailed timing analyzes gives the solution to the problem

at the moment the 4mhz strobe should have small internal delay so 2 FF at 50mhz is already too much, so i need either deglitch with no phase delay or then need change my other logic to tolerate the delay

I have some other options too, but all they are not so simple and easy to implement, so I am hoping some magic hint to fix the problem :) will try use the 4mhz directly and measure error rate, with 50mhz running and disabled, as last resort i can use other free running clock entering FPGA from different side and PLL, and disable the 50mhz oscillator completly this should hopefully decrease the overall power and crosstalk noise

Antti

Vote

A

Antti 18 years ago

Hi Peter,

thanks I do know those things think also but i still printed the xcell pages out :) now, in Xilinx FPGA I dont see the problem :) but the final target is actel FPGA (because: cost+security+package) and in Actel i see both double and missing clocks

so i am still puzzled, there must be something very basic bad thing somewhere

Antti

Vote

M

mk 18 years ago

I think it's highly humorous for a Xilinx employee to post a link to a university in India for a document which Xilinx has created but unable to archive apparently.

Vote

M

mk 18 years ago

Actually the whole magazine is tucked away at a directory on Xilinx FTP site but google doesn't seem to like FTP sites too much so a search doesn't get you there.

Vote

K

KJ 18 years ago

OK, so it appears that the 4 MHz input is being synchronized to the 50 MHz clock through a single flip flop, but have you verified that the final routed design uses only one flip flop?

Are you using the now synchronized 4 MHz signal as a clock? Do you know for sure that the Actel device will properly generate an internal clock signal from the output of a flip flop? You can't see clock signal quality internal to a device but that doesn't imply that it doesn't matter. If "output of a flip flop to the clock input of another" is not something that Actel handle then maybe you shouldn't be doing that.

That's good to know.

Signal quality on the 50 MHz clock does matter...what if the osc is bad or flaky? It's not my first suspect either but again worth verifying at the input to the FPGA with a scope (when you get access to one) so that it can be eliminated as a cause.

That would also tend to lower power supply noise as a culprit too in my mind...again, it can only be eliminated as a cause by verification with a scope when you get a chance.

Bad power is worse than a bad clock, all sorts of bad things can happen.

From here it really smells like the problem is not properly synchronizing the 4 MHz signal or that the internally generated clock is not a good clock for whatever reason.

Also, is the output of the two bit counter directly observable to you and is that the reason you say that you miss a clock or get two every now and then or is it because of some other downstream logic output? The reason for asking is because no matter what you do, if you use the 'synchronized 4 MHz' to actually clock a flip flop the output of that two bit counter will be skewed a bit from the 50 MHz clock because of the unavoidable clock to output delay of the flip flop (plus possible additional skew from differences in the clock distribution between the 50 MHz and the 'synchronized 4 MHz' internal to the device.

I've found and fixed many other designer's errors by getting rid of internally generated clocks because, no matter how well you do, there is inherent design skew that can not be eliminated and yet can not be tolerated either.

Is there an option in the timing analysis that causes it to not analyze clock domain transfers? I know with Altera, the default is to not analyze paths that start in one clock domain (i.e. the 'synchronized 4 MHz' domain) and end in another (i.e. the 50 MHz domain). When you turn that option on, it does do the analysis. Check with the Actel stuff to see how it analyzes (or doesn't) stuff in between two domains.

In my experience, post route simulation generally does not catch any timing problems....it just takes a long time to do and really doesn't catch hardly anything.

Not sure what you're referring to here.

Given that you're (likely) operating this in a non-radioactive, Earth bound environment, the only things that will cause intermittent failures though as I stated earlier are:

- Timing

- Signal quality

- Power

Nothing else.

Since signal quality and power don't appear to be the most likely suspects based on what you've posted, that leaves timing as the lurking design issue so timing analysis will find the problem.

Since you're only driving a two bit counter, I'd suggest instead that you get rid of the synchronizing flop on the 4 MHz and instead use it as the direct input to a two or three bit shift register that is clocked by 50 MHz. Inherently then you're building in the proper synchronization, and the outputs of the shift register will all be in the 50 MHz clock domain, not something that you hope to be close to it (i.e. the 'synchronized 4 MHz' domain). Except for the very first shift reg output those other outputs can be combined however needed to emulate whatever it is that you were really doing with the 2 bit counter.

I'm suspecting that it shouldn't take too much effort to make the 2 bit counter run off of the 50 MHz with a clock enable signal that is computed by combinatorially combining the last two taps of the shift register (i.e. Tap(2) and not(Tap(3)).

Magic won't help ;)

Except that you haven't measured anything that indicates that power supply noise or crosstalk is the issue.

Kevin Jennings

Vote

M

Mike Treseler 18 years ago

I would try soldering on one of those little schmitt trigger packs. Sometimes a low slew rate will clock both edges once in a while.

-- Mike Treseler

Vote

A

Antti 18 years ago

yes schmit trigger input could be the cure.. but i still cant understand the missing clock pulses!

Antti

Vote

K

KJ 18 years ago

Your post never mentioned anything about having measured a slow edge on the

4MHz signal either. If the edge rate is within spec, adding a Schmitt trigger will have no effect. Have you taken a scope to that signal and measured the edge rate?

Kevin Jennings

Vote

P

Peter Alfke 18 years ago

Not so strange when the Xilinx employee (and incidentally also the author of the article) is at home and in a hurry, and just googles for the first hit. Who cares about geography in the age of the internet... Peter Alfke

Vote

J

Jim Granville 18 years ago

Hi Antti, If you have a clean design using Xilinx, and the EXACT SAME (?) design fails in Actel, then that does seem to exclude the ASIC.

Does Actel have pin Hysteresis ?

Can you move the 4MHz pin about on the device, and check the failure stats ?.

Try a tiny-logic schmitt right at the Actel device ?

And also try a Tiny-Logic D-FF as an external syncroniser.

Can you simplify the Actel logic to the very lowest to verify clock integrity (even some test code that is a FreqDetector, looks for 2MHz or 8MHz timeframes, and sticky latches (or counts) - should be only a few FFs, then try this with/without the rest of ther chip alive.

The only time I've seen double clocking, is on slowish edges, on pins without Hysteresis. (oh, and when I've gone looking for it, with test waveforms :)

Cables are NOT nice, as they have significant series inductance.

Floating pins have done strange things too, as I suspect the buffer stages go into transistion oscillation, at very high frequencies, and the internal ground inductance is simply too high at those frequencies and things get 'quite confused'

-jg

Vote

J

Jim Granville 18 years ago

10M is at 4MHz rate so every 2.5s, and 25 secs ?

Can you clarify the double-clocking ? Do you mean one 4Mhz Edge, causes two INCs, 20ns apart ? (rather than the expected INC @ 250nS +/- 20ns ) ?

What if you set up two of these one on each 4MHz edge ?

- are the error stats the same on rising/falling ?

-jg

Vote

J

Jim Granville 18 years ago

That may not be a 'tools' problem at all, but could be a real hardware issue. It is unlikely their test coverage is that great in such 'corner cases' - most vendors expect you to go to a larger device when you hit 90% ! :)

As more and more of the device is active, the ground noise and crosstalk has to get steadily worse - what external loads is the device driving, and what package ?.

I get the impression this was part of the driving force behind the flip-chip-bga and all-die-bondpads

-jg

Vote

async clk input, clock glitches

Join the Discussion

Didn't find your answer?