Help Determining Machine Cycles

Anyone know where there might be a table similar to what is shown at

formatting link
to determine the number of machine cycles for a given assembly language instruction for Intel processors newer than the regular Pentium? I'm interested in Pentium II, III, Pro, etc.

Adam

Reply to
Adam Elbirt
Loading thread data ...

I'm not sure such a table would mean much, what with 2 levels of cache and pipelining, out-of-order execution, etc...

Reply to
Jim Stewart

True but there must be some way to say something to the effect of "a register to register XOR instruction requires NNN machine cycles" as a basic thing for each machine. Does that make sense?

Adam

Jim Stewart wrote:

Reply to
Adam Elbirt

You're looking for some kind of unit that you can add to get total execution time? That would be simpler if there was a single pipeline. Later pentia have overlapped and out-of-order execution.

So it's generally one ALU cycle to XOR, in one of the ALUs. But it may stall if the input registers isn't ready. Except that the result can be forwarded directly from another ALU operation (perhaps only if it's the same ALU). And The whole operation can be killed if the register (as seen in the instruction stream) is overwritten before it's read.

This makes it hard to produce any kind of scalar number. If you really want to know, take a look at the finite-state-machine model GCC uses to schedule instructions.

All this ignores issues of cache and virtual memory translation, which can affect the XOR by stalling nearby instructions.

And that's why people avoid Pentia in real-time systems.

--
	mac the naïf
Reply to
Alex Colvin

I'm actually looking for cycle counts not execution time since execution time depends on out of order execution, caching, etc. Regardless of these effects, the number of cycles an instruction takes should be fixed based on addressing mode and I haven't been able to find anything past the Pentium though John Hudak pointed me to some good info.

Adam

Alex Colv>>True but there must be some way to say something to the effect of "a

Reply to
Adam Elbirt

There is no meaningful difference between cycle counts and execution time. Cycles are just measurement units of execution time, nothing more.

Sorry, but you're deluding yourself. There is no such thing as a "cycle count for a given instruction" in any post-Pentium IA32 CPU. The closest you can get to such a thing is an algorithm to compute such a number from a large set of inputs (including practially random ones like cache hit/miss, pipeline stalls, activities of other the simulated second CPU if "Hyperthreading" is in use, ...). Fitting that algorithm onto a single page of paper might already prove tricky

--- condensing it to a single number that could be called "cycle count" of that instruction would yield a meaningless number.

If you insist on having that number: it's 1 for almost all the original x86 instructions except the FPU ones, these days.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply to
Hans-Bernhard Broeker

He'll probably find what he's looking for at

formatting link
But it won't be in a table. "You have much to learn, grasshopper..."

Regards, -=Dave

--
Change is inevitable, progress is not.
Reply to
Dave Hansen

Snip..

Near coffee spew and LOL.

Reply to
Jim Stewart

In general, this is true. To much nondeterminism in data latencies. Mostly true if you are designing hard RT systems that must be completely deterministic in their execution time. Firm and soft RT systems have contingencies (various exception handlers) built into the OS if a thread exceeds WCET. There are RT systems built that allow preemption and threads exceeding their deadline. Scheduling analysis that takes this into account use Average Execution Time (AET). The short answer to the original question is that there is a minimum execution time, a maximum execution time, and average execution time dependent on data locality and workload. As with any engineering analysis there are tradeoffs to deal with the uncertainties. So, do computations on 1) best case (clean cache, efficient compiler-loop unrolling, maximum code locomotion), and then 2) worst case (I&D execute out of cache, dumb compiler, etc.) John

Reply to
John Hudak

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.