Anyone know where there might be a table similar to what is shown at
Adam
Anyone know where there might be a table similar to what is shown at
Adam
I'm not sure such a table would mean much, what with 2 levels of cache and pipelining, out-of-order execution, etc...
True but there must be some way to say something to the effect of "a register to register XOR instruction requires NNN machine cycles" as a basic thing for each machine. Does that make sense?
Adam
Jim Stewart wrote:
You're looking for some kind of unit that you can add to get total execution time? That would be simpler if there was a single pipeline. Later pentia have overlapped and out-of-order execution.
So it's generally one ALU cycle to XOR, in one of the ALUs. But it may stall if the input registers isn't ready. Except that the result can be forwarded directly from another ALU operation (perhaps only if it's the same ALU). And The whole operation can be killed if the register (as seen in the instruction stream) is overwritten before it's read.
This makes it hard to produce any kind of scalar number. If you really want to know, take a look at the finite-state-machine model GCC uses to schedule instructions.
All this ignores issues of cache and virtual memory translation, which can affect the XOR by stalling nearby instructions.
And that's why people avoid Pentia in real-time systems.
-- mac the naïf
I'm actually looking for cycle counts not execution time since execution time depends on out of order execution, caching, etc. Regardless of these effects, the number of cycles an instruction takes should be fixed based on addressing mode and I haven't been able to find anything past the Pentium though John Hudak pointed me to some good info.
Adam
Alex Colv>>True but there must be some way to say something to the effect of "a
There is no meaningful difference between cycle counts and execution time. Cycles are just measurement units of execution time, nothing more.
Sorry, but you're deluding yourself. There is no such thing as a "cycle count for a given instruction" in any post-Pentium IA32 CPU. The closest you can get to such a thing is an algorithm to compute such a number from a large set of inputs (including practially random ones like cache hit/miss, pipeline stalls, activities of other the simulated second CPU if "Hyperthreading" is in use, ...). Fitting that algorithm onto a single page of paper might already prove tricky
--- condensing it to a single number that could be called "cycle count" of that instruction would yield a meaningless number.
If you insist on having that number: it's 1 for almost all the original x86 instructions except the FPU ones, these days.
-- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.
He'll probably find what he's looking for at
Regards, -=Dave
-- Change is inevitable, progress is not.
Snip..
Near coffee spew and LOL.
In general, this is true. To much nondeterminism in data latencies. Mostly true if you are designing hard RT systems that must be completely deterministic in their execution time. Firm and soft RT systems have contingencies (various exception handlers) built into the OS if a thread exceeds WCET. There are RT systems built that allow preemption and threads exceeding their deadline. Scheduling analysis that takes this into account use Average Execution Time (AET). The short answer to the original question is that there is a minimum execution time, a maximum execution time, and average execution time dependent on data locality and workload. As with any engineering analysis there are tradeoffs to deal with the uncertainties. So, do computations on 1) best case (clean cache, efficient compiler-loop unrolling, maximum code locomotion), and then 2) worst case (I&D execute out of cache, dumb compiler, etc.) John
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.