How To Synchronize FPGAs

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hello newsreaders,

For a while I have been confronted with the following task which I find
quite challenging but unfortuantely didn't manage to solve it, yet.
What I want to do is to use 2-4 FPGAs (Xilinx Virtex 2 Pro) together on one
printed circuit board (PCB). They are used to process a large amount of
incoming serial data (data rates of several GHz's). My idea is to handle
that data parallel by the 2-4 FPGAs. But now there arises the problem how to
adequately split the data and how to synchronize the FPGAs among one
another, in particular?
Is it possible or first of all a realistic idea to synchronize multiple
FPGAs in the GHz range? How can this be done without much protocoll
overhead? I would like to do it without applying an extra transfer protocoll
among the FPGAs just for that purpose! Up to this date I didn't find a
proper solution, yet.
Maybe someone can give me a hint? Any ideas how to solve that problem?

Regards,    Leroy Tanner

Re: How To Synchronize FPGAs
Maybe I am missing something, but wouldn't you just drive all the chips with
one onboard clock then in your code trigger the processes on the rising


Quoted text here. Click to load it

Re: How To Synchronize FPGAs
Post Below...

Quoted text here. Click to load it
Start Post....

It gets tricky when you have multiple FPGAs clocked at hundred(s) of MHz.  I
don't have any direct expeience there, but I think looking for appnotes on
vendor sites that address "Board Level De-skew" (using FPGA clocking
resources to account for clock distribution headaches) and specifically for
Xilinx, "Channel bonding" (using multiple RocketIO transceivers to receive
data in parallel).   The RocketIO transceivers are difficult beasts, at
least if you're not  using a standard protocol.  I'm not sure if the channel
bonding can span multiple V2pro devices, but I know it can span multiple

Not sure on your budget, or application requirements, but it may be
worthwhile going to a single, larger part that contains the resources you
need.  It at least partially removes the headache of high-speed PCB

--Josh Model

Re: How To Synchronize FPGAs
...or at least take all the high speed serial stuff into one FPGA and
distribute it from that one to the others at a slower parallel rate. Also,
it looks like V4 could take care of this with its ChipSync thingy for source
synchronous application.
Cheers, Syms.
Quoted text here. Click to load it

Re: How To Synchronize FPGAs

Quoted text here. Click to load it

ok, I agree on that and it might be a good approach to minimize skewing in
the first section. but nevertheless I must synchronize the other FPGAs to
each other, not at a rate of several GHz but say at ca. 300 MHz. In my
opinion a central clock isn't an appropriate solution!?

Re: How To Synchronize FPGAs
Think about what a central clock entails from purely a routing perspective.
Let's assume you're an SI wizard, and have no issues there.

300 MHz would be ~ 3.3 ns per clock cycle.  If I remember my rule of thumb,
you've got about 6 inches per 1 ns for the speed of an electrical signal in
FR-4 material.  So the worst case match between all your data lines and all
clock lines for all FPGA's will be the skew that eats into your timing

Just as an example (I'm not really a layout person, so it's my posterior
speaking), matching all lines to 4 FPGAs +/- 3 inches seems relatively
tricky, but not completely unreasonable.  So now ~1/3 of your entire clock
cycle is wasted (more, if you were assuming DDR) before you even get to the
FPGA fabric.  it makes laying out your design that much more tricky.

Now, in the slightly more real world you've got to throw in the jitter
present on a 300 MHz clock, impedance mismatches causing reflections,
crosstalk on your board with all that data zipping around (because GHz and
even 300 MHz lines are really antennae)  and you've got a lot to deal with.

Anyhow, synchronzing dataflow at those speeds on a PCB is not nearly as
simple as just plopping down a clock.  It's a hard design, but you get to
choose where to place the burden.  If you've got really good PCB people,
maybe they can match and terminate the really well.  If you've got the DCM/
DLL (or their altera, or "insert brand" counterpart) hardware to de-skew the
board clock, you could let the FPGA do it (though I don't recall at what
frequencies the DCM's top out).  If you've got neither, you might want to
consider going to a single chip serial interface, because you're going to
get into trouble otherwise.


Quoted text here. Click to load it

Re: How To Synchronize FPGAs
Hi Leroy,
Say you've got 4 FPGAs A, B, C & D. Each gets fed the 300MHz clock, so on
the fabric of each FPGA is CLK_A, CLK_B etc. When you send data from (say)
FPGA B to FPGA D, send a clock with the data, generated by FPGA B from its
internal CLK_B, called (say) CLK_B_TO_D. Use this source synchronous clock
with a DCM in FPGA D to get the data into a BRAM FIFO inside FPGA D. Get the
data out from this FIFO into the fabric of FPGA D using CLK_D. Repeat for
all the other paths. Any good?
Cheers, Syms.

Quoted text here. Click to load it

Re: How To Synchronize FPGAs
Yes, you *are* missing something...  ;)

Quoted text here. Click to load it


Rick "rickman" Collins
We've slightly trimmed the long signature. Click to see the full one.
Re: How To Synchronize FPGAs

 >  But now there arises the problem how to
Quoted text here. Click to load it

I believe most important is to first latch the signals in the IOB
to minimize clock skew problems.   Otherwise, an external shift
register to generate bit parallel signals for input to the FPGA.

-- glen

Re: How To Synchronize FPGAs

Quoted text here. Click to load it

There are two ways to approach this problem:  (1) have each FPGA perform a
part of the process on the entire data stream or (2) have each FPGA
perform the entire process on part of the data stream.  We once
implemented (2) for a bandwidth expander where each chip did the complete
process (one clock cycle Huffman decoding, translation of the code to a
value, then arithmetic processing) for a portion of the incoming data
stream.  Each chip was provided a chunk of the incoming data (e.g., in a
two-chip system, chip one processed chunks 1,3,5,... of the data and
chip two was processed chunks 2,4,6,... of the data).  We actually used two
on the board because of I/O bandwidth limitations, but the chip was
designed to allow for 1,2,4,or 8 chip operation.  


Site Timeline