Take a look at XAPP802 (overview) and XAPP702 (memory interfaces) and XAPP700 (network interfaces) Just enter these names in the upper right hand corner search window on the Xilinx website. Here is a (not so short) description of a powerful approach:
Capturing the Input Data Valid Window.
Let's assume a continuously running clock and a 16-wide data input bus. Let's assume the clock is source-synchronous, i.e. its rising transition is aligned with the data transitions, and all these transitions have little skew.
The user faces the problem of aligning the clock with respect to the data in such a way that set-up- and hold-time specs are obeyed and (hopefully) data is captured close to the center of the data valid window. Given the fairly wide spread between worst-case set-up- and hold-time as specified by the IC manufacturer, a carefully worst-cased design will achieve only modest performance, since the designer is forced to accomodate the specified extreme set-up and hold time values of the input capture flip-flops. Typical values are positive 300 ps set-up time, negative 100 ps hold time, which implies a 200 ps window. The actual capture window is only a small fraction of a picosecond, but, depending on temperature, supply voltage or device processing, it might be positioned anywhere inside the specified wide window.
Here is a self-calibrating design approach that achieves much better performance by largely eliminating the uncertainty of the flip-flop characteristics.
This approach assumes reasonable tracking of the input flip-flops driven by the data and clock inputs, and assumes programmable delay elements at each input buffer.
The incoming clock is buffered and used to clock all data input flip-flops. The incoming clock is also used as if it were data, run through its own delay element X, then driving the D input of a clocked flip-flop. Its output is then used to control a state machine that manipulates X to find the two edges of the valid window, where the flip-flop output changes. Note that changing X has no impact on the bus data capture operation, it only affects the control flip-flop. Once both edges are found, the state machine calculates the center value, and applies this in common to all data input delays.
This auto-calibration circuit can run continuously (or non-continuously), since it does not interfere with normal operation. It means that the user can completely ignore the flip-flop set-up and hold time specifications, the spread between set-up and hold-times, and their possible variation with temperature and Vcc. This circuit does not compensate for skew between data lines, or any skew between data and clock, and it assumes good tracking between all input flip-flops, and relies on a reasonably fine granularity in the delay adjustments. Fundamentally, this auto-calibration reduces the data capture uncertainty from a first-order problem, to a second order issue, thus permitting substantially higher data rates and/or higher reliability of operation. Virtex-4 programmable input delays have 75 picosecond granularity. A low-skew data bus can thus be captured at bus data rates in excess of
1Gbps, even when the data valid window is smaller than 200 ps. Peter Alfke 3-31-05