All,
The problem being described is a very practical one. Many lifetimes ago, in a different job, I designed telecom synchronization systems.
These systems looked at references from two separate redundant feeds (east-west, or north-south, as they are usually called in the literature), and (tried) to keep a local oscillator locked in phase to one (or the other, but usually not both) references.
The local oscillator might be pretty good (Stratum 4, just a plain old crystal), or might be really quite good (a rubidium steered quartz oscillator).
It was inevitable that somewhere, sometime, the phase detectors, frequency counters, etc. would get to just the right point, and metastable outputs would happen (which led to bad counts, or bad phase measurements). When the phase was "just right" the naturally occurring jitter would rock you back and forth across the metastable threshold ....
The result would be that the clock system would suddenly think it was terribly out of sync, and slew like crazy to go find phase lock (with what it thought was the right phase now).
This was, of course, unacceptable, as all systems sync'd to this one would dump their buffers (overflow or underflow). As well, alarms were turned in, references switched, and so on. Not pretty.
A two stage solution was initially used, and found to be not good enough. We then used a series of what we thought were clever arrangements of flip flops and clock edges, all no help (as noted above in the other thread, it is easy to make it worse, not better).
Finally, a simple software patch solved the whole problem: if you suddenly think you are WAY OFF, stop, throw the result out, and check again. If you still think you are WAY OFF, then just call the new spot your reference (if it really is WAY OFF, then all the damage is done, and why should you continue to make all sync'd equipment suffer), and turn in a "notification" of poor sync quality (so if someone at the network control center feels we are doing the wrong thing, they can override and take over control).
As long as this little patch executed about once every 6 months, no one cared.
So, the solution to this little problem: build the best synchronizer you can, and also have another means to deal with the inevitable "failure" to get a valid result.
Another solution that we never tried was to build three separate detection packages for phase and frequency, each offset by a portion of the period (like 0, 1/3, and 2/3) and then vote. If two out of three of the detectors said the phase was 0x1234h (yes, it was a 16 bit phase detector, 8 bits for the raw number of clocks, and 8 bits for the fraction of a clock), then we might go ahead and use that number. If there was no majority, the value would be ignored, and the reading taken again.
Austin