Redundant clock switching

I'd like to implement a clock distribution system with a clock source (24MHz) switchover in the case of failure. Not exactly a trivial problem, especially when a smooth transition between the clocks is required. Best to be delagated to a ready-made chip, which... barely exists. The only two chips I was able to find are the ICS581-2 (perfect match for my needs) and ICS580 (3 cycle stall, don't know yet if it's a problem or not). The first one turns out to be unobtainium and Mouser labels it as obsolete. So what is the modern way of doing this? I don't think I have a sudden attack of dysgooglia, there seems not to be much about the issue.

On a similar note: what oscillators should be used? Two exact copies of the same part, or a combo of a crystal oscillator and a MEMS? Is there any wisdom handed down over the ages about the long-term reliability of MEMS oscillators?

Best regards, Piotr

Reply to
Piotr Wyderski
Loading thread data ...

Maybe I'm missing something here. But the use of an oscillator as the missing chip seems the obvious option. Your 2 other oscillators phase lock it, but if their input ceases to exist it oscillates on its own, preserving output.

NT

Reply to
tabbypurr

The secondary stage can die and then you are left without the signal, despite the first oscillator still working correctly.

Doing this properly is a tricky stuff, so I wanted to pay somebody else for doing it for me. But it turns out there are 2 chips that can do that (meaning, they have built-in loss of signal detectors), one of which is not available even as a sample. What's going on?

Best regards, Piotr

Reply to
Piotr Wyderski

A few more words about what sort of failure you're guarding against would be useful. With all the possible failure modes of a complex system, why pick on the poor oscillators?

Failure of an external clock, that I get. But kitty's approach seems like it would cover that one.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
 Click to see the full signature
Reply to
Phil Hobbs

Even the Apollo guidance computer didn't have a redundant clock source AFAIK...

Another unusual but plausible situation I could think of is when there's not necessarily a permanent failure of the "master" clock but some kind of temporary out-of-bounds condition? Like an extreme temperature situation where the fast clock becomes unreliable somehow so you switch down to a slow clock that's designed to be better behaved at 130 C or

-50 until hopefully the extreme condition passes and things return to normal

Reply to
bitrex

Might be helpful:

Reply to
bitrex

You could probably easily implement the structure in the last figure in about 15 minutes using the manufacturer drag-n-drop GUI in one of the low-end Silego/Dialog "GreenPak" OTP mixed-signal gate arrays; they're mad cheap like 35 cent in quantities of a hundred

No financial affiliation I just think they're cool products :-)

Reply to
bitrex

idk how you have a "smooth transition" between two clocks one of which has failed entirely, you need some kind of watchdog or timeout circuit to detect the total clock loss in the first place and by the time you've detected it there's nothing to "transition" between now is there.

"E's not pinin'! 'E's passed on! This clock is no more! He has ceased to be! 'E's expired and gone to meet 'is maker! 'E's a stiff! Bereft of life, 'e rests in peace!"

Figure it must be some other kind of problem less than total failure if OP is talking about "smooth transitions"

Reply to
bitrex

At least not if both clocks are the same frequency, if the backup clock was significantly slower it could be triggering the watchdog to check on the fast clock during its own off time and if the fast clock has gone t*ts up during that period hopefully somehow figure out if there's enough time left in its own off period for the watchdog to set up a switchover without a glitch or if it needs to hold off a cycle.

Reply to
bitrex

Using an oscillator as I suggested would get a smooth transition. The 3rd osc is only locked by osc 2 when the phase of the 2 is close. Until that point, osc 3 runs on its own. Osc1: feeds a bit into osc 3 Ocs2: feeds a lower level into osc 3 Now Osc 1 rules, until it dies when osc 2 rules, until that dies when osc 3 rules. And all switchovers are smooth

NT

Reply to
tabbypurr

Cart-before-horse engineering: trying to perfectly synchronize transitions between a bunch of flakey clocks that for whatever reason in the designer's estimation have a good chance of being the primary point of failure

Horse-before-cart engineering: design whatever the flakey clocks are feeding to be glitch-tolerant so it works OK being fed whatever trashy clock happens to be available at the time. It'll probably require sacrificing performance to do that. Sucks brah but likely no way around it if the requirements involve operating the widget inside a nuclear reactor or on the surface of the Sun or w/e

Reply to
bitrex

I'm having trouble thinking about this rig without a diagram but it sounds like that's still vulnerable to a single failure, what's the situation if it's oscillator 3 that fails first and not the other two?

Reply to
bitrex

I think having three clocks is getting into "three-and-four-engine-aircraft-can-be-less-fault-tolerant-than-two" territory, my proposal would be to have two clocks and some kind of synchronizer where under "normal" operation they're each gating each other's half-cycles.

Set up the gating so if you lose either one everything continues on immediately as normal just at half-speed, without having to do any timeout or skipped-cycle detection stuff.

Reply to
bitrex

Crystal aging and all the cumulative mechanical stress-induced failures.

Many other problem sources can be TRMed and the entire modules isolated from their buses with a bunch of low resistance analog switches, but without a clock the entire system stops, hence I think it'd be wise to have a failover hardware here.

There will be no external clock.

Best regards, Piotr

Reply to
Piotr Wyderski

But it was designed for just several days of continuous operation. In my case it is exactly the opposite: no high temperatures, no vibrations, "infinite" endurance.

Best regards, Piotr

Reply to
Piotr Wyderski

I think the usual way is to have your internal osc drive the clock....and when an external clock is present it is used to discipline the internal.

If the external clock goes away, the internal continues to run disciplined.

IDT probably makes what you need

formatting link

mark

Reply to
makolber

Plus if you use an analog phase detector, if Osc 1 falls out of the lock range, osc 2 takes over. You do get a bit of a beat note though.

Crystal aging is tough to spot unless you have at least three oscillators, preferably of different makes. A hard failure is easier to detect, but who knows what the output is doing while it's failing? Of course you could send it through a kilometer of coax so you'd have more time to react before the failure arrived at the far end. ;)

My money would go on a hi-rel oscillator if I determined that that was a likely failure mode. MEMS ones are pretty bulletproof, I think.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
 Click to see the full signature
Reply to
Phil Hobbs

"infinite" endurance is one of the hardest fault-tolerant design problems of all, I think. Guess you could say that the problem by nature is "underspecified."

I think trying to design something for "infinite" endurance that has at all graceful or stylish failure behavior, or even maintains stock performance after things have started going wrong, is going to be an exercise in frustration.

To make the problem tractable I think you have to accept that when shit inevitably breaks, performance is going to degrade in some proportion to how much has gone wrong, the challenge is to try to make it a somewhat linear degradation rather than a cliff.

As an analogy how does a badly damaged ship make it to port? Most ships don't have three backup boilers and watertight compartments designed such that they can take on 500,000 gallons of water and run at full speed, it's too expensive and just more stuff to go wrong!

But badly damaged ships do manage to struggle along for very long periods of time half full of water, badly listing to port at three knots with all but one boiler flooded or destroyed, toss all the dead weight overboard even the anchor, start ripping up the floorboards and toss the furniture in the burner, tear the roof off and knock the funnels out to get more airflow, whatever it takes, damn the torpedoes full steam ahead, whatever it takes!

That's how I think of "fault tolerance" I guess

Reply to
bitrex

That's the terminology IDT uses in their papers and I find it completely legitimate. You have two clock sources of about the same frequency and unknown phase relationship. When you detect the first source is dead, and it happens on a cycle-by-cycle basis, you immediately switch to the other clock. But not in a dumb, 74HC4066 way, because this can produce a glitch and your 100MHz circuit gets an effective 1GHz pulse. You need a synchronizer, e.g. like this:

formatting link

And it is more or less what ICS580-1 does. But it stalls for 3 cycles when a clock switch occurs, so it is good to add a PLL-based post-switching cleanup block. Then you will not even notice the switchover, the edges will get nicely interpolated. And it is ICS581-2. All my clock "loads" appear to have a PLL, so probably the stall cannot be a problem and 580-1 is obtainable.

Best regards, Piotr

Reply to
Piotr Wyderski

Thought they did have a secondary way to compute the critical burns in case the master system was unable to perform. ISTR whilst the LEM was in flight they had a fair few minor system glitches under load too. Thankfully none of them affected the trajectory or control systems sufficiently badly to abort the landing on Apollo 11.

formatting link

(this is a lot more detailed than the report I had intended to find)

Even if the clock ends up being the only thing left alive? I have known systems that were supposed to be ultra reliable where due to a strange architectural quirk the only thing still working was the watchdog timer.

--
Regards, 
Martin Brown
Reply to
Martin Brown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.