Synthesis optimization people seem to like registers at I/O. Particularly, in Xilinx manual:
"The synthesis tools will not optimize across the Partition interface. If an asynchronous timing critical path crosses Partition boundaries, logic optimizations will not occur across the Partition boundary. To mitigate this issue, add a register to the asynchronous signal at the Partition boundary."
I like the registers all over design. Though, they speak like it is game inject a register in arbitrary place.