serial protocol specs and verification

A

alb 12 years ago

Hi all,

I have the following specs for the physical level of a serial protocol:

For the communication with Frontend asynchronous LVDS connection is used.
> The bitrate is set to 20 Mbps.
> Data encoding on the LVDS line is NRZI:
> - bit '1' is represented by a transition of the physical level,
> - bit '0' is represented by no transition of the physical level,
> - insertion of an additional bit '1' after 6 consecutive bits '0'.

Isn't there a missing requirement on reset condition of the line? System clock is implicitly defined on a different section of the specs and is set at 40MHz.

At the next layer there's a definition of a 'frame' as a sequence of 16 bit words preceded by a 3 bit sync pattern (111) and a header of 16 bits defining the type of the packet and the length of the packet (in words).

I'm writing a test bench for it and I was wondering whether there's any recommendation you would suggest. Should I take care about randomly select the phase between the system clock and the data?

Any pointer is appreciated. Cheers,

Al

A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?

Vote

R

rickman 12 years ago

Async, eh? At 2x clock to data? Not sure I would want to design this. I assume you have to phase lock to the data stream somehow? I think that is the part I would worry about.

In simulation I would recommend that you both jitter the data clock at a high bandwidth and also with something fairly slow. The slow variation will test the operation of your data extraction with a variable phase and the high bandwidth jitter will check for problems from only having two samples per bit. I don't know how this can be expected to work myself.

I did something similar where I had to run a digital phase locked loop on standard NRZ data (no encoding) and used a 4x clock, but I think I proved to myself I could do it with a 3x clock, it just becomes impossible to detect when you have a sample error... lol.

Rick

Vote

A

alb 12 years ago

[]

currently they are experiencing a large loss of packets as well as many corrupted packets (CRC errors). I'm not sure the current implementation is doing phase lock.

Since modules are likely to have different temperatures being far apart, I would certainly expect a phase problem. Your idea to have a slow and a high frequency variation in the phase generation might bring out some additional info.

what do you mean by saying 'it becomes impossible to detect when you have a sample error'?

Vote

R

Richard Damon 12 years ago

You don't need to specify a reset state, as either level will work. At reset the line will be toggling every 7 bit times due to the automatic insertion of a 1 after 6 0s.

I would be hard pressed to use 40 MHz as a system clock, unless I was allowed to use both edges of the clock (so I could really sample at a 4x rate).

For a test bench, I would build something that could be set to work slightly "off frequency" and maybe even with some phase jitter in the data clock. I am assuming that system clock does NOT travel between devices, or there wouldn't be as much need for the auto 1 bit, unless this is just a bias leveling, but if isn't real great for that.

Vote

A

alb 12 years ago

[]

Uhm, since there's a sync pattern of '111' I have to assume that no frame is transmitted when only zeros are flowing (with the '1' stuffed every 6 zeros).

I'm thinking about having a system clock multiplied internally via PLL and then go for a x4 or x8 in order to center the bit properly.

Rick was suggesting a phase jitter with a high and a low frequency component. This can be even a more realistic case since it models slow drifts due to temperature variations... I do not know how critical would be to simulate *all* jitter components of a clock (they may depend on temperature, power noise, ground noise, ...).

Your assumption is correct. No clock distribution between devices.

Vote

R

rickman 12 years ago

I was assuming that perhaps you were doing something I didn't quite understand, but I'm pretty sure I am on target with this. You *must* up your sample rate by a sufficient amount so that you can guarantee you get a minimum of two samples per bit. Otherwise you have no way to distinguish a slipped sample due to clock mismatch. Clock frequency mismatch is guaranteed, unless you are using the same clock somehow. Is that the case? If so, the sampling would just be synchronous and I don't follow where the problem is.

It is not just a matter of phase, but of frequency. With a 2x clock, seeing a transition 3 clocks later doesn't distinguish one bit time from two bit times.

I'm having trouble expressing myself I think, but I'm trying to say the basic premise of this design is flawed because the sample clock is only

2x the data rate. I say you need 3x and I strongly encourage 4x. At 4x the samples have four states, expected timing, fast timing, slow timing and "error" timing meaning the loop control isn't working.

Data ____----____----____----____----____----____----____ SmplClk --__--__--__--__--__--__--__--__--__--__--__--__--__ SmplData -----____----____----____----____----____----____----

This is how you expect it to work. But if the data is sampled slightly off it looks like this.

Data ____---____----____----____----____----____----____ SmplClk --__--__--__--__--__--__--__--__--__--__--__--__--__ SmplData -----________----____----____----____----____----___

You can't use a locked loop like this because you have no info on whether you are sampling fast or slow.

The sample clock does not need to be any particular ratio to the data stream if you use an NCO to control the sample rate. Then the phase detection will bump the rate up and down to suit.

Do you follow what I am saying? Or have I mistaken what you are doing?

Rick

Vote

R

rickman 12 years ago

Just to be clear my suggestion for simulating with both fast and slow clock frequency variations is not intended to match any real world conditions so much, but just to exercise the circuit in two ways that I would expect to detect failures.

If the clock is sampling the data on the edge, it is random which level is measured. This can be simulated by a fast jitter in the clock. A slow noise component in the clock frequency would provide for simulation of mismatched clock frequencies in both the positive and negative directions. Another way of implementing the slow drift is to just simulate at a very slightly higher frequency and at a very slightly lower frequency. That might show errors faster and more deterministically.

Rick

Vote

G

glen herrmannsfeldt 12 years ago

(snip)

Everyone's old favorite asynchronous serial RS232 usually uses a clock at 16x, though I have seen 64x. From the beginning of the start bit, it counts half a bit time (in clock cycles), verifies the start bit (and not random noise) then counts whole bits and decodes at that point. So, the actual decoding is done with a 1X clock, but with 16 (or 64) possible phase values. It resynchronizes at the beginning of each character, so it can't get too far off.

For 10Mbit ethernet, on the other hand, as well as I understand it the receiver locks (PLL) to the transmitter. Manchester coding is wasteful of bandwidth, but allows for a simpler receiver. I believe it is usual to feed the transmit clock to the PLL to keep it close to the right frequency until a signal comes in. Speeds up the lock time.

Seems to me that it should depend on how far of you can get. For async RS232, you have to stay within about a quarter bit time over 10 bits, so even if the clock is 2% off, it still works. But as above, that depends on having a clock of the appropriate phase.

-- glen

Vote

R

rickman 12 years ago

Yes, that protocol requires a clock matched to the senders clock to at least 2.5% IIRC. The protocol the OP describes has much longer char sequences which implies much tighter clock precision at each end and I'm expecting it to use a clock recovery circuit... but maybe not. I think he said they don't use one but get "frequent" errors.

Not sure why you mention phase. In 232 type character async you have

*no* phase relationship between clocks. There is no PLL so you aren't phase locked to the data either. I guess you mean a clock with enough precision?

I've never analyzed an async design with longer data streams so I don't know how much precision would be required, but I"m sure you can't do reliable data recovery with a 2x clock (without a pll). I think this would contradict the Nyquist criterion.

In my earlier comments when I'm talking about a PLL I am referring to a digital PLL. I guess I should have said a DPLL.

Rick

Vote

G

glen herrmannsfeldt 12 years ago

(snip, I wrote)

(snip)

The reason for the 16x clock is that it can then clock the bits in one at a time with any of 16 different phases. That is, the actual bits are only looked at once (usually).

If you start from the leading edge of the start bit, choose which cycle of the 2x clock is closest to the center, and count from there, seems to me you do pretty well if the clocks are close enough. Also, the bit times should be pretty close to correct.

I was thinking of an analog one. I still remember when analog (PLL based) data separators were better for floppy disk reading. Most likely by now, digital ones are better, possibly because of a higher clock frequency.

-- glen

Vote

L

langwadt 12 years ago

Doesn't sound so different from usb (full speed)

usually done by sampling the 12mbit/s using a 48MHz clk or rising and falling edge on 24MHz clock

-Lasse

Vote

R

rickman 12 years ago

That is the point. With a 2x clock there isn't enough resolution to "pick" an edge. The clock that detects the edge is somewhere in the first *half* of the start bit and the following clock is somewhere in the second half of the start bit... which do you use? Doesn't matter, if the clock detecting the start bit is close enough to the wrong point, one or the other will be far too close to the next transition to guarantee that you are sampling data from the correct bit.

If you have an analog PLL then you just need to make sure your sample clock is *faster* than 2x the bit rate. Then you can be certain of how many bits are between adjacent transitions. But if at any time due to frequency error or jitter you sample on the wrong side of a transition you will get an unrecoverable error.

When it comes to analog media like disk drives where the position of the bit pulse can jitter significantly I would expect a significantly higher clock rate would be very useful. It all comes down to distinguishing which half of the bit time the transition falls into. With a run of six zeros (no transition) between 1 bits (transition) it becomes more important to sample with adequate resolution with a DPLL or to use an analog PLL.

I did a DPLL design for a data input to an IP circuit to packet card. It worked well in simulation and in product test and verification. I'm not sure they have used this feature in the field though. It was added to the product "just in case" and that depends on the customer needing the feature.

Rick

Vote

G

glen herrmannsfeldt 12 years ago

(snip, I wrote)

The easy way is to use the opposite edge of the clock. I suppose that really means that the clock is 4x, though, so maybe that doesn't count. Say you clock on the falling edge. If the clock is currently high, the next falling edge will be less than half a cycle away. If it is currently low, then it will be more. Using that, you can find the falling edge closest to the center.

The hard way is to have the receive clock slightly faster or slightly slower. That is, the speed such that if the first edge is in the first half, later edges will be later in the bit time, and not past the 3/4 mark. Now, having different receive and transmit clocks is inconvenient, but not impossible.

(snip)

It is interesting in the case of magnetic media. The read head reads changes in the recorded magnetic field. For single density (FM) there is a flux transition at the edge of the bit cell (clock bit), and either is or isn't one in the center (data bit). So, including jitter, the data bit is +/- one quarter bit time from the center, and the clock bits are +/- one quarter from the cell boundary. The data rate is half the maximum flux transition rate. The time between transitions is either 1/2 or 1 bit time.

For the usual IBM double density (MFM), the data bits are again in the center of the bit cell, but clock bits only occur on bit cell boundaries between two zero (no transition) bits. The data rate is then equal to the maximum flux transition rate. The time between transitions is then either one or 1.5 bit times. The result, though, as you noted, is that it is more sensitive to jitter. In the case of magnetic media response, though, there is a predictable component to the transition times. As the field doesn't transition infinitely fast, the result is that as two transitions get closer together, when read back they come slightly farther apart than you might expect. Precompensation is then used to correct for this. Transitions are moved slightly earlier or slightly later, depending on the expected movement of the read pulse.

One way to do the precompensation is to run a clock fast enough such that you can move the transition one cycle early or late. The other way is with an analog delay line.

The early magnetic tape used NRZI coding, flux transition for one, no transition for zero. Odd parity means at least one bit will change for every character written to tape. Even parity means at least two will change, but you can't write the character with all bits zero. Both were used for 7 track (six bit characters) and odd parity was used for 800 BPI 9 track tapes. There can be long runs of zero (no transition) for any individual track, but taken together there is at least one.

For 1600 BPI tapes, IBM changed to PE, which is pretty similar to that used for single density floppies. The flux transition rate can be twice the bit rate (3200/inch) but each track has its own clock pulse. It is fairly insensitive to head azimuth, unlike 800 BPI NRZI. There are no long periods without a transition on any track. Reading tapes is much more reliable, especially on a different drive than the data was written on.

IBM 6250 tapes use GCR, with more complicated patterns of bit transitions, and more variation in time between transitions. Again, much more reliable than its predecessor.

-- glen

Vote

R

Richard Damon 12 years ago

My assumption for the protocol would be that between frames an "all zero" pattern is sent. (note that this is on the layer above the raw transport level, where every time 6 zeros are sent, a 1 is added). Thus all frames will begin with three 1s in a row, as a signal for start of frame (and also gives a lot of transitions to help lock the clock if using a pll).

I would think that sampling at 4x of the data rate is an minimum, faster will give you better margins for frequency errors. So with a 20 MHz data rate, you need to sample the data at 80 MHz, faster can help, and will cause less jitter in your recovered data clock out.

Note that the first level of processing will perform data detection and clock recovery, and this might be where the 40 MHz came from, a 40 MHz processing system can be told most of the time to take data every other clock cycle, but have bandwidth to at times if the data is coming in slightly faster to take data on two consecutive clocks. You don't want to make this clock much faster than that, as then it becomes harder to design for no benefit. Any higher speed bit detection clock needs to have the results translated to this domain for further processing. (You could also generate a recovered clock, but that starts you down the road to an async design as the recovered clock isn't well related to your existing clock, being a combinatorial result of registers clocked on your sampling clock.)

Vote

A

alb 12 years ago

Hi Rick,

On 29/07/2013 17:19, rickman wrote: []

There's no clock distribution, therefore each end has its own clock on-board. We are certainly talking about same oscillator frequency, but how well they match is certainly something we *do not* want to rely on.

I agree with you, the 2x clock is not fine enough to adjust for phase shifts and/or frequency mismatch.

uhm, I didn't quite follow what you mean by 'fast timing' and 'slow timing'. With perfect frequency matching I would expect a bit to have a transition on cycle #2 (see graph). If the bit is slightly shifted I would either notice the transition in cycle 2 or cycle 3 depending on being it slightly earlier or slightly later than the clock edge.

bit center ^ | cycles 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 Data ________--------________--------________--------_____ SmplClk -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- SmplData __________--------________--------________--------___

On perfect frequency match SmplData will be 1 clock delayed.

Uhm this graphics shows a clock frequency which is 1x the clock frequency of the data... Am I missing something??? This will never work of course...

I might use the internal PLL to multiply the clock frequency to x4 data frequency (=80 MHz) and then phase lock on data just looking at the transition. If for some reason I see a transition earlier or later I would adjust my recovered clock accordingly.

I'm sure this stuff has been implemented a gazillions of times.

I follow partially...I guess you understood what I'm saying, but I'm loosing you somewhere in the middle of the explanation (especially with the graph representing a 1x clock rate...).

Vote

A

alb 12 years ago

On 29/07/2013 19:40, glen herrmannsfeldt wrote: []

I believe that with 4x or 8x you could easily resync at the bit level. First transition comes in a shift register (4ff or 8ff), when the shift register has half of the bit set and half reset you generate a clock to sample data. Second transition comes in and the same mechanism happens. The clock recovered is adjust to match when the transition happens in the middle of the shift register.

Since the protocol is bit stuffed, it won't get too far off.

[]

IMO a phase shift does not matter too much, while the frequency mismatch will accumulate time differences and lead the transmitter and receiver to have different timings. But if you lock on phase shift it means you lock on frequency as well.

Vote

R

rickman 12 years ago

Ok, we are on the same page then.

No point in even discussing the "perfect" frequency match.

Yes, you are right, still your diagram above shows a 4x clock. That will work all day long. It is the 2x clock that doesn't work well. A

3x clock will work but can't provide any info on whether it is sync'd or not. A 4x clock can tell if the data has slipped giving an error.

What I meant further up by the timing is that your circuit will detect the data transitions and try to sample near the middle of the stable portion. So with a 4x clock if it sees a transition where it expects one, it is "on time". If it sees a transition one clock early it knows it is "slow", if it sees a transition clock one late it knows it is "fast". When it sees a transition in the fourth phase, it should assume that it is out of sync and needs to go into hunt mode. Or you can get fancier and use some hysteresis for the transitions between "hunt" and "locked" modes.

I designed this with an NCO controlled PLL. With your async protocol you should be able to receive a packet based on the close frequency matching of the two ends. This would really just be correcting for the phase of the incoming data and not worrying about the frequency mismatch... like a conventional UART. This circuit can realign every 7 pulses max. That would work I think.

I was making this a bit more complicated because in my case I didn't have matched frequency clocks, it was specified in the software to maybe

1-2% and the NCO had to PLL to the incoming data to get a frequency lock. I also didn't have bit stuffing so a long enough string without transitions would cause a lock slip.

Yes, that is it exactly. The bit stuffing will give you enough transitions that you should never lose lock. It is trying to do this at

2x that won't work well because you can't distinguish early from late.

Sorry. If this is not clear now, I'll try the diagram again... lol

I would give you my code, but in theory it is proprietary to someone else. Just think state machine that outputs a clock enable every four states, then either adds a state or skips a state to stay in alignment only when it sees data transitions. If it sees a transition in the fourth state, it is not in alignment. If there is no transition the FSM just counts...

A timing diagram is worth a thousand words.

Rick

Vote

A

alb 12 years ago

On 29/07/2013 22:14, rickman wrote: []

At the physical level the bit stuffing will allow to resync continuously therefore I'm not concerned if there's a clock recovery circuit.

We are using 40MHz (0.5 ppm stability) but after few seconds you can already see how many cycles two clocks can drift apart.

Nyquist criterion has nothing to do with being able to sample data. As a matter of fact your internal clock is perfectly capable to sample data flowing in your fpga without the need to be 2x the data rate.

Why bothering? If you have a PLL on your FPGA you can profit of it, otherwise you need something fancier.

Vote

A

alb 12 years ago

On 30/07/2013 06:45, Richard Damon wrote: []

A frame is defined as follows:

- sync :'111'

- header: dtype (4) - n.u.(2) - length (10)

- data : (16) * length

in principle between frames there can be any number of zeros (with bit stuffing). An 'all zero' pattern in this sense might be of any number of bits.

[]

I also agree with you, no way a 2x would be sufficient to recover a phase shift.

A 40 MHz would be sampling 2x, which is clearly not sufficient.

The deframed data (the data portion of the above mentioned frame structure) are going into a fifo, I think I can rework it to be a dual clock fifo to cross domain.

Vote

R

rickman 12 years ago

I don't know what you are talking about. If you asynchronously sample, you very much do have to satisfy the Nyquist criterion. A 2x clock, because it isn't *exactly* 2x, can *not* be used to capture a bitstream so that you can find the the transitions and know which bit is which. Otherwise there wouldn't be so many errors in the existing circuit.

Not sure of your context. You can't use the PLL on the FPGA to recover the clock from an arbitrary data stream. It is not designed for that and will not work because of the gaps in data transitions. It is designed to allow the multiplication of clock frequencies. A DPLL can be easily designed to recover the clock, but needs to be greater than 3x the data rate in order to distinguish the fast condition from the slow condition.

You can use the FPGA PLL to multiply your clock from 2x to 4x to allow the DPLL to work correctly.

Rick

Vote

serial protocol specs and verification

Join the Discussion

Didn't find your answer?