I have the standard sort of circuit from the Xilinx App note driving an off chip clock:-
Main clock comes onto chip through an IBUFG to CLKIN of the DLL
CLK0 from the DLL is fed off the chip through an OBUFT.
The output of the OBUFT, which is on a global clock pin, is fed back in via an IBUFG to form CLKFB of the DLL.
This seems to work fine.
Main clock to output clock delay is constrained to