I've heard this rumor many times from different sources: that Virtex4 TBUFs don't actually have tri-state buffers in them, and that they're actually implemented as a set of wires and muxes which is logically (though not electrically) equivalent to everything you could do if they were.
Has Xilinx ever gone on-record with any complete or partial answer to this question? I can't find anything in the app notes.
If this is true, I'm pretty baffled as to how you could have long lines. To implement an equivalent structure without tri-state buffers, you'd need to replace each long line with N wires where N is the number of "possible drivers" -- a pretty large number. Isn't that rather inefficient?