I am trying to find some literature of how to design All Digital PLL which would extract clock from NRZ signal. The main problem here is that zero crossing is not present every bit. Also, does anyone know how to set digital filter parameters (in practice) based on loop bandwidth, tracking band and parameters like that? Thanks.
I think there is a nice book on this subject written by Best.
Here is its full name: R. E. Best. Phase-Locked Loops Design Simulation and Applications. The McGraw-Hill Companies, New York, New York, 1999
What is the maximum NRZ data rate? If it's small enough to be oversampled by higher frequency clock, there could be an alternative implementation. You can use a self-resettable NCO which is reset to a seed FTW each NRZ posedge / negedge.
"Vladislav Muravin" schrieb im Newsbeitrag news:YYwwe.8234$ firstname.lastname@example.org...
Nice book, but does not really provide ideas about clock recovery. "only" PLL design.
Have a look at the USB specs, there is a paper on their homepage which describes a DPLL vor CDR (clock/data recovery) in USB full speed (12 Mbit/s) using a 48 MHz clock. I tried to copy their FSM and it didnt work for me, maybe I messed something up. But my own CDR works fine on USB.
BTW, plain NRZ will not work reliable, you need a minimum transition density in your datastream. So a least scambling is neccessary (SDH) or NRZI (USB) or whatever.
It's time for some basic explanation: NRZ stands for Not-Return-to-Zero, which means High for 1 and Low for
And no clock! How can you recover the clock? It's impossible if the data can be an endless string of either 0 or 1. So there must be transitions on which to re-adjust the clock recovery circuit. One way is to scramble the data, which creates additional transitions (The very low probability of the scrambler actually eliminating transitions can be statistically ignored). Or you re-code the data with additional bits, like 8B10B where 8 bits of arbitrary data are sent as a 10-bit stream with several transitions ( or 64B66B in a similar way). After that decision, you know how long the clock recovery circuit has to coast without re-synchronization. And you can design an analog or digital PLL to do the clock recovery. UARTS rely on a start bit before, and usually also one or two stop bits after the data byte. The start bit is oversampled, usually 16 times, and the local clock is supposed to be accurate enough to recover the 8 data bits, until the next start bit provides re-synchronization. Peter Alfke
I apologize for long answer, but I hope it's helpful.
I have read Peter Alfke's response. He's right (as always).
Let's distinguish between clock recovery and the engine / logic which keeps lock on a certain clock (PLL).
Clock recovery is done using a certain reference information, which could be a timestamp followed by dedicated processing logic, or even a 0-->1 / 1--> 0 transition, as in our case.
So..... Let's go into UART a little bit. Say, a 18.432 MHz clock oversamples an asynchronous input. Assume the data rate is 115200 bps, which is exactly 18.432 MHz / 60. Once a falling edge is detected, you reset a counter to 59 or 0 (here : seed) and count down or up, respectively. Once a counter reaches zero or 59, respectively, this is your sampling point for the data. So you can sample as much data as you'd like.
Ok, if so far we understand, let's move to the next step. Let's assume that there is an asynchronous serial interface with any data rate up to..... well... 20 Mbps.
Now, NCO (Numerically Controlled Oscillator) is fed with FTW (Frequency Tuning Word), which is basically the value added to N-bit counter each clock cycle. Take a look Analog devices, DDS chips, such as AD9954. They have a very nice explanation of this concept.
Now, going further with NCO and FTW, let's take a look at the example. Say you have 160 MHz clock and 32 bit NCO, which represents approximately
37.25 millihertz resolution. Assume 11.5 Mbps data rate, which corresponds to 0x12666666 AFTER TRUNCATION of fractional part. On negative edge and positive edge of the incoming data NCO accumulator is reset to zero and counts. MSB of an NCO is the recovered clock! Isn't this nice?
This mechanism is something that I implemented with ~135 MHz recovered clock up to 11 mbps.
But the problem is when your reference is applied once in a long period of time, or there is a "bad" relationship between the data rate and the higher speed clock, so the sampling point could be shifted. That's why there are scramblers or any other, that ensure that there are not more than X consecutive ones or zeroes so that the distance between consecutive posedges and negedges is sufficient to ensure "tracking" after the recovered clock...
This depends on the scrambler polynomial. The original POS (Packet Over SONET) spec, RFC1619, relied on the 7th order SONET polynomial for scrambling. This was unfortunately easy to 'spoof' with specially created packets. An additional 43rd order scrambler was added in RFC2615 to reduce the chance that this denial of service attack could succeed.
"The major change from RFC 1619 is the addition of payload scrambling ... RFC 1619 was operationally found to permit malicious users to generate packets with bit patterns that could create SONET/SDH-layer low-transition-density synchronization problems, ...
The use of the x^43 + 1 self-synchronous scrambler was introduced to alleviate these potential security problems. Predicting the output of the scrambler requires knowledge of the 43-bit state of the transmitter as the scrambling of a known input is begun. This requires knowledge of both the initial 43-bit state of the scrambler when it started and every byte of data scrambled by the device since it was started. The odds of guessing correctly are 1/2**43, with the additional probability of 1/127 that a correct guess will leave the frame properly aligned in the SONET/SDH payload, which results in a probability of 9e-16 against being able to deliberately cause SONET/SDH-layer problems. This seems reasonably secure for this application."
For one thing, the PLB runs at 100MHz maximum frequency, even if the processor runs much faster. So that's one limitation, only a third of the maximum rate. Another is the data bandwidth on your SDRAM: is it really 64 bit or only 32bit? On most eval boards it's 16 oder 32bit, so that's another factor of 2 or 4 less (if it's SDR-DRAM).
Then you have to think about refresh and all that. Even if you do bursts, the controller has to refresh the memory regularly, so that takes away even more performance.
Then you have to consider that the PLB is a bus structure. So even if there's only one component on the bus, there's always a few clock cycles for bus arbitration when you start a transfer.
Then it depends on your program. For example, if you use a for-loop to write to and read from every address one after another, then what you actually do is single 32bit-accesses, meaning that for each
32bit-data-word you read there's bus arbitration, DRAM latency, maybe you have to wait a few clock cycles because the DRAM is currently refreshing and so on. This added up maybe gives you as much as 8 clock cycles (at 100MHz) to read a mere 4 bytes, multiplied by 3 gives you 24 CPU clock cycles (at 300MHz) for one single access, if all goes really bad, and there you have your ~50MB/s, assuming your RAM is a 32 bits wide SDR-DRAM.
I don't know how bursts are handled in the DDR-core for the PLB, so I can't comment on that. But in DDR-SDRAM burst can only be as longs a 8 ticks, then the whole SDRAM-latency-thing starts again.
Neither can I give any useful guesses as to why writing should be slower than reading, could be some issues with caching.
In cases like this it could also be useful to debug the generated ELF-file to see how your code maps to assembly instructions.
I have no knowledge of this implementation, but common sense should tell you that reads are inherently slower than writes when both are optimally implemented... For a write, you can throw the address & data at the RAM at the same time, and let the RAM and controller get on with it while you go do something else (other than reading from the same RAM). For a read, you have to wait until the RAM gives you the data you asked for.
e.g. in a processor running from a local cache, the write will be happening in parallel with the fetching of the next instruction(s), but the read must wait for the data to come back from the RAM.