I'm not sure exactly what David's referring to, but if you have a clock with a bit-count that's higher than the word width of your machine, and a clock read function that's guaranteed to take a small fraction of the time it takes for your clock to run through the bottom-most machine word of it's count, then you can make a dependable clock read that works in just three reads.
For example, if you have a 32 bit clock in a 16-bit machine, and it takes less than 32767 ticks to execute your function, you read the upper word, then the lower word, then the upper word. Like this:
upper_first_read = upper; lower_read = lower; upper_second_read = upper;
Then you look at the two upper word reads. If they're the same, then you construct your clock reading from a concatenation of either of them with the lower read. If they're different, then your look at the lower word: if it's less than 32768 you know that the clock rolled over between the first upper word read and the lower word read, so you concatenate the second upper and the lower word. If they're different and the lower word is large, you know that it rolled over _after_ the lower word was read, and you concatenate the lower word with the _first_ upper word.
I assume that this makes perfect sense to you now, and that you can see how, if the overall function doesn't get stalled for a good long time, that this is going to be dead reliable, forever. Or, you'll remember doing it.