s
I've done it in a Xilinx, and it's not fast. First you have to go across t he routing fabric and go through a set of LUTs to get onto the carry chain. The carry chain is pretty fast; getting on and off the carry chain is slo w. After you get off the carry chain, you have to go through the general r outing fabric again. This is where most of your clock cycle gets eaten up. Remember, if you had dedicated hardware, this would be a dedicated route. Now you get into a second set of LUTs, where you have to AND the data fro m the carry chain with the original number in order to get a one-hot bus wi th only the leading 1 set. Now you have to encode that into a number which you can use for your shifter. You may be able to do this with the same se t of LUTs; I can't remember.