Preventing optimization in cross clock domain logic

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
In a situation where it is necessary to cross between two clock
domains within an FPGA, I might use logic that produces an output
toggle (toggle_out) on the 2nd clock in response to a single-cycle
pulse (pulse_in) on the 1st clock, using two processes and double
buffering to mitigate metastability. In VHDL, it might look like this:

signal t1, t2, toggle : std_logic;

process(first_clk)
begin
if rising_edge(first_clk)
if pulse_in = '1' then -- detect pulse on first_clk
t1 <= not(t1); -- toggle signal t1 (on first_clk)
end if;
end if;
end process;

process(second_clk)
variable t1_old, t2_old : std_logic;
begin
if rising_edge(second_clk)
if t1 = not(t1_old) then -- detect toggle on t1 (domain cross here)
t2 <= not(t2); -- then toggle t2 (on second_clk)
end if;
if t2 = not(t2_old) then -- detect toggle of t2 (double buffer)
toggle <= not(toggle); -- toggle pulse
end if;
t1_old := t1;
t2_old := t2;
end if;
end process;
toggle_out <= toggle;

The reset logic has been left out for clarity. In the second process,
t1 is compared to its previous value (t1_old) to see if it has
toggled. However, since the lines:
a) if t1 = not(t1_old)
b) t1_old := t1
occur sequentially, and t1 is asynchronous to second_clk (since the
domain crossing occurs here), t1 could change between (a) and (b).
I could insert an additional variable at the start of the process to
register t1 so that it is only read once, but even that could be
removed by the synthesizer during optimization. Alternatively, the
synthesizer could decide to replicate logic anyway.
Is there some way to tell the synthesizer (I use Xilinx XST) not to do
any optimization or replication just on this module, or some other way
to prevent this potential problem?

Many thanks

Tom

Re: Preventing optimization in cross clock domain logic
I can't say that I have thoroughly analyzed your code, but I think I
can suggest some better code.  It is only slightly different from
yours, but I am sure it does not have any problems as it was given to
me some years ago and has worked for myself and everyone else I know
who has used it.

  ClkDtctReg: process (RT, SysReset) begin
    if (SysReset ='1') then
      RTClkDetect <= FALSE;
    else  -- Add an IF (strobe) here for your case
      RTClkDetect <= not RTClkDetSync;
    end if;
  end process ClkDtctReg;

  ClkSyncReg : process (GenClk, SysReset) begin
    if (SysReset = '1') then
      BackupClk <= '0';
      ClkSel <= '0';
    elsif (rising_edge(GenClk)) then
      BackupClk <= not BackupClk;  -- 24.576 MHz in, 12.288 MHz out
      RTClkDetSync <= RTClkDetect;
      RTClkDetSync_D <= RTClkDetSync;
      if (CTPClkSel = '1') then
        ClkSel <= '1';    -- board clock selected
      elsif (RTClkDetSync xor RTClkDetSync_D)  -- did we see an RT clock?
        ClkSel <= '0';    -- RT clock selected
      else
        ClkSel <= '1';    -- clock failure detected, wait for board reset
      end if;
    end if;
  end process ClkSyncReg;

This is code I am currently writing and have not tested, plus it is
for a slightly different application than yours.  So there may be
typos.

In the second clock domain the expression "RTClkDetSync xor
RTClkDetSync_D" generates a pulse in the second clock domain in
response to a pulse in the first clock domain.  This can be registered
or used directly if the clock is slow enough (adequate slack in the
path) to resolve the metastability.  The basic circuit is one FF on
the first clock side, two more on the second clock side and the loop
is just between the one on the first clock side and the first FF on
the second clock side with a single inverter in the return path.

I think a schematic is simpler to understand, but ascii art doesn't
convey well...

Rick


Tom wrote:
Quoted text here. Click to load it

Re: Preventing optimization in cross clock domain logic
Quoted text here. Click to load it

Thanks very much for the code. I've interpreted it a bit for my case,
which seems to work (using numeric_std):

signal t1, t2, t3 : std_logic;
-----
process (in_clk_in, rst_in) -- on input clock
begin
if rst_in 3D% '1' then
t1 <3D% '0';
elsif rising_edge(in_clk_in) then
if pulse_in 3D% '1' then
t1 <3D% not(t2); -- when detect input pulse, set t1 to inverse of t2
end if;
end if;
end process;

process (out_clk_in, rst_in) -- on output clock
begin
if rst_in 3D% '1' then
t2 <3D% '0';
t3 <3D% '0';
pulse_out <3D% '0';
elsif rising_edge(out_clk_in) then
t2 <3D% t1;
t3 <3D% t2;
if ((t3 xor t2)3D%'1') then
pulse_out <3D% '1';
else
pulse_out <3D% '0';
end if;
end if;
end process;

I had forgotten about the use of the xor gate - it is explained a bit
here too:
http://www.chipdesignmag.com/print.php?articleId3D%32?issueId3D%5
The loopback of t2 between the two domains seems to avoid the issue of
reading the same variable at different times on an asynchronous clock.
As long as the second process is not replicated for some reason (which
could cause t1 to have different values in each replication), this
would seem to work fine.
If out_clk_in is fast, an additional register (e.g. t4) could be added
and then do xor between t4 and t3 if it were felt necessary to extend
the allowable metastability settling time further, but in general this
version would seem ok if the t2 path length is short to the xor gate.

Any further comments appreciated.

Tom

Re: Preventing optimization in cross clock domain logic
Quoted text here. Click to load it

I think you've got it.  I don't see why you have the inner-most if
statement conditioned on the XOR.  That could be a simple assignment,
if pulse_out <= t3 xor t2;

Keep in mind that putting this in the process creates an extra FF.
This FF is only needed if the clock is fast compared to the settling
time of the FFs.  Peter Alfke has posted here many times that with
current technology the settling time is very short, pretty much a
couple of nano-secs.  So unless you are running over 100 MHz clocks
(or just didn't have any slack time), the extra FF is not needed
giving you one clock less delay.  I can't imagine that you would need
a fourth FF in any event.

Rick

Re: Preventing optimization in cross clock domain logic
Quoted text here. Click to load it

Good point: pulse_out <3D% t3 xor t2 is the same.

Also, in relation to my last point, I think if you add the attribute:

attribute MAX_FANOUT of t1 : signal is "1";

that would make certain that the second process cannot be replicated.
That may be overkill as it is unlikely the synthesizer would replicate
logic in this case anyway, but I guess you can never be sure.

Many thanks.

-Tom

Re: Preventing optimization in cross clock domain logic
Quoted text here. Click to load it
TLDR

Hi Tom,

From the archive, Rick Collins once said --->
Anyone had big problems with similar async circuits?  BTW, here is the
simple sync circuit to generate a single pulse in the target clock
domain regardless of the relative speed of the clocks.

   |------- Metastable -------|
            __________
           |          |                        _____
   |------O| inverter |-------|---------------|     | Pulse
   |       |__________|       |               | XOR |---->
   |    ______       ______   |   ______   |--|_____| Out
   |   |      |     |      |  |  |      |  |
   |---| D  Q |-----| D  Q |--|--| D  Q |--|
Strobe |      |     |      |     |      |
/Clock |      |     |      |     |      |
-------|>     |  ---|>     | |---|>     |
       |______|  |  |______| |   |______|
                 |           |
                 |___________|___________  Output Clock

The pulse out should be clean by the next clock edge as long as the
routing is kept short.  Or if the clock period is very short another FF
can be added to feed the other leg of the XOR gate and assure a clean
output.
HTH, Syms.



Re: Preventing optimization in cross clock domain logic
Quoted text here. Click to load it

Thanks for that Symon -

checking the RTL schematic synthesized from my VHDL, the above seems
to be what I've ended up with, albeit with extra registers on input
and output, which are probably superfluous and just adding delay!

Best, Tom.


Site Timeline