n
Actually, you can still estimate the time it takes to do a routine by cycle counting. It now takes a lot more work to get very accurate but you can just assume that the cycles to fetch or the cycles to perform the operations will need to happen and take the larger of the two numbers over spans of about 10 instructions. If the processor does parallel instructions just divide by the number it can do in parallel.
Why would anyone bother doing this you may ask. You can quickly throw away the obviously bogus ideas for how to make a routine faster before actually testing them. These days, the table lookup almost always loses to the calculation methods on the basis of bus cycles.
er
I had a machine with IIRC, the V20. It was the 8088 look alike. I put it in the first "PC" I had. It did make the machine just a little faster. I wrote a program that took about an hour to finish. The NEC chip got it done in about 10% less time.
That machine had a 9Meg hard drive in it. It was supposed to be 10Meg but only about 90% of the tracks worked.