Gero,
As noted by others, you need a high quality clock to each FPGA. They do not need to be phase matched, or equal length. Just a good very low jitter clock to each one from a clock distribution device (like the ICT quad LVDS clock buffer + cleanup PLL -- a good choice I use -- I think it is a 8745?):
formatting link
or equivalent. The nice thing about some of these parts is that you may use a lower frequency less expensive oscillator, and this part will multiply the frequency, and remove jitter, too (as opposed to buying a much more expensive higher frequency oscillator).
This then is the basis for all timing in each FPGA.
To get from one FPGA to another, or from one FPGA to an ASIC/ASSP:
formatting link
You would use a source-synchronous interface. This one where you send the data, and a clock from one, to the other. Since the data jitter will be exaclty what the clock jitter is (they came through the same paths, close to each other), system jitter, and clock jitter are almost able to be ignored. The receive side uses the forwarded clock to register the incoming data. Often the DCM is used to phase shift the sample point to exactly the center of the "eye" so you have best timing margin. The data paths with clocks must all be delay matched (all signals must leave and arrive in sync with each other). We have tables of the delay from the silicon, to the pad, and you need to have your pcb designer take this into account.
There is also a source synchronous IO block built into V4, and you can do things like use one forwarded clock to frame 4 bits on each wire (the forwarded clock is running at 1/4 the system clock). The SSIO block takes care of the serialization and parallel conversion of each IO.
formatting link
Chapter 8.
Also look at the SPI POS 4.2 IP core, and how it is specified for inter chip communications (this is an industry standard for wide, fast, DDR buses).
So, in review: system synchronous (one clock to all chips) for each chip to use for all of its internals, and generating forwarded clocks; AND source synchronous (a forwarded clock) for each data bus between devices.
Searching on "source synchronous" and "spi pos 4.2" and reading the user's guides on the SSIO blocks will get you up to speed.
Austin