Phase relationship management

- C
- Chuck McManis
  
  Contact options for registered users
posted
19 years ago

Sun, May 16, 2004 11:59 PM

So here is a "simple" thing that clearly isn't as simple as one would like.

I'm constructing a PWM unit for a robotics application, given the shortage of pins on my microprocessor, I'm serially clocking in my data.

My "module" has sdata_in, sdata_out, sclock_in, sclock_out, clock_in, pwm_out.

As I'm doing this in an inexpensive CPLD, I'd like to be able to gang a couple together and just tie the sclock_out, sdata_out, to the next chip in series and then create a chain of these things. However, inside my CPLD I'm using code like

process (sclock_in, sdata_in) is begin if rising_edge(sclock_in) then data_reg

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, May 17, 2004 12:52 AM

buffers like that are kludges. They are asking for troubles with CPLDs or FPGAs since the software is likely to eat them and/or the routing may be much slower than a buffer. (They can be made to work if you don't have any other choices and are willing to hand route and hand check that area.)

Can you clock data_out on the falling edge? You really want to do something like that. (both in your repeater logic and at the source)

Another approach which takes more logic is to have a local clock that runs much faster than the bit rate and just watch for transistions on the "clock" line. Then you can insert delays by whole clocks. (including negative delays)

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, May 17, 2004 3:31 AM

Besides Hal's comments of 'Care and watch the tools like a hawk', I'd suggest : Look at the 4094 / HC4094 data, which show a cascade scheme, but with a common clock.

Buffering the clock can sometimes be necessary (eg cascade opto-isolations ), and if you do that, it is good practice to derive the SHIFT CLOCK from the OUTPUT pin - that avoids creeping race conditions, and also if ANY form of additional buffering is being used, you will need to latch your cascade data on the opposite edge. ( gives 1/2 CLK Tsu.Th )

If you want to cascade a lot of these, you will also need to watch skew degradation.

-jg

- P
- Philip Freidin
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, May 17, 2004 7:08 AM

A good example of how this is solved is the serial daisy chain configuration of the Xilinx FPGAs. It even uses one less pin than what you are thinking of, since there is no need for a clock out pin. The clock goes to all devices.

What Xilinx does, is it samples the data coming into each chip on the rising edge, and the daisy chained data out is clocked on the falling edge.

For your application with a 2.5 us cycle time, that gives you a setup and hold window of 1.25 us, which should work regardless of logic family, and the size of your PCB.

Depending on the length of the shift register within each device, the latency is N + 0.5 clock cycles per device. But this does not accumulate across devices, as the .5 cycle of latency is treated as part of the device to device delay.

Example of devices with 5 bits of SR.

DEV1: rising edge , clocks in data bit 0.

5 th rising edge clocks bit 0 into last position of SR in device 1. (DEV 1 now has bits 0..4)

Following falling edge (5.5 cycles after start), Serial out is updated with bit 0

DEV2:

6 th rising edge clocks bit 0 into the second device. (and data bit 5 goes into DEV 1)

This scheme (at your clock rates) is extremely tolerant of variations in both clock and data routing. It can tolerate 100's of ns skew.

Obviously it would be best if the original data source follows this protocol too, changing the source data at about the same time as the falling edge of the clock source. This is easily done even in a bit-banged micro interface.

Philip Freidin

Philip Freidin Fliptronics

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, May 17, 2004 7:59 AM

Good description. Thanks.

Unless the chain gets too long and the signal integrity on the clock isn't good enough.

-- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.

- C
- Chuck McManis
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, May 18, 2004 4:32 AM

An excellent example Philip. I've been re-reading Chang[1] on shift registers as well. Chang's code for the "generic" shift register is :

1 entity SHIFTR is 2 port { 3 CLK, RSTn, SI : in std_logic; 4 SO : out std_logic; 5 } 6 7 architecture RTL of SHIFTR is 8 signal FF8 : std_logic_vector(7 downto 0); 9 begin 10 posedge: process (RSTn, CLK) is 11 begin 12 if (RSTn = '0') then 13 FF8 Depending on the length of the shift register within each device,

This was where I get confused. Given that the clock is in parallel, regardless of the length of the register, should the delay be simply 'x' where X is the propogation of the Flip flop from D to Q ? I'm thinking that on any clock the data that is going into the next flip flop is already sitting on the Q output of its predecessor in the chain.

[elided]

That I think I can manage :-) The driver is probably some 8 bit micro like a PIC or AVR chip. My next challenge is to figure out how to infer a transparent latch so that I can clock in new data "behind" the old data and then "expose" it all at once (keeps my PWM units in sync which is important in some cases).

--Chuck

-----

[1] "Digital Systems Design with VHDL and Synthesis", K.C. Chang, Chapter 6, Basic sequential circuits. Pb IEEE Computer Society, ISBN 0-7695-0023-4

- P
- Philip Freidin
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, May 18, 2004 9:49 AM

That is correct, SO will be updated on the rising edge of the clock. It has 1 full clock cycle to get to the input of the first flipflop of the next shifter in the next chip.

This is not a good assumption, because you are going across a PCB. The clock could arrive early or late with respect to the destination of SO.

Given your 2.5 us cycle time, the half clock trick burns half of the cycle to make this a non issue, as the data changes happen 1.25 us away from the clock rising edge, thus easilly meeting any setup and hold requirements at the destination, regardless of any reasonably conceivable clock and data skew.

So the last FF in the SR (FF8(0)) changes on the rising edge. Take its output to another FF, clocked on the fallin edge. The output of this FF is the new, 1/2 cycle shifted SO signal.

The delay for any flop is CLK to Q, not D to Q.

That is right. This is even true of the SO FF I am describing, as it has had 1/2 a cycle (1.25 us) to get to the output FF.

The latency I was describing is what you see while you are debugging with your Tek 465. If you look at the SO pin of (for example) a 5 bit shift register, you will see the data coming out 5.5 cycles after it went in, but you only use 1/2 a cycle to get to the next FF in the next chip, so over all, the shifter as seen from the sw point of view is oblivious to the extra SO FF, and the 1/2 cycle delay.

Philip

Philip Freidin Fliptronics

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, May 20, 2004 8:33 AM

The rules of the game are that you have to meet both setup and hold times.

If you have a string of FFs like a shift register, you are depending on the clock-out delay of the source FF to cover the hold time of the target FF. You also have to leave room for clock skew.

If the manufacturer promises that things like that will work in their silicon, the tools don't have to bother checking for hold times. Xilinx works that way. I assume others do, but I'm not familiar with the details.

When you go between chips, you can't ignore hold time anymore. You have to check them just like you check the setup times.

The case you describe will probably work, but it's possible to make things like that fail. Clock skew is probably the easiest way.

The classic clock problem with (really) old CMOS logic was a clock feeding a long string of DIPs. The capacitive load of the clock pins turns the clock trace into a loaded transmission line which is quite a bit slower than the speed of light. Unloaded data bits could beat the clock and cause hold problems.

Or the input thresholds could be slightly different on a slowly rising clock so one chip of an adjacent pair clocks ahead of its neighbor.

Using the other edge of the clock avoids all that nonsense. It gives you a half cycle of setup and a half cycle of hold. It will work with horrible clock skew between chips.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.