Does the same instruction take a different length of time if accessing another variable? Likewise, does the same instruction on the same variable take as long if done whilst NOT in an interrupt routine?
Is the variable somewhere odd, or is there a second instance of it, so that you are not accessing what you think you are? Have you viewed the actual machine code to see that nothing odd has been added by your assembly phase?
8259 is a legacy interupt controller interface, APIC is the modern standard.
But modern processors execute instructions out of the order specified in the program! It is quite possible that the out instructions have not completed execution when the second rdtsc executes, unless the "inc dword [_running_irq]" is there.
If this is on a multi-processor machine cacheline ping-pong is a possible cause.
That is irrelevant. Cyril is quite correct. I/O port accesses take about a microsecond, regardless of bus speed or where the port is located. Part of this comes from the need to maintain historical compatibility.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
Because i8259 is still an LPC (new name for ISA) device working at the slow ISA bus speed.
This is really strange and intriguing. I assume you've made sure that your timing code is correct (i.e. uses synchronizing instructions as Intel recommends). Can you try incrementing a different address, does it make a difference? An address off SS segment (e.g. inc dword ss:[-4])? Replace the increment with a mov?
cpuid rdtsc mov subtime, eax cpuid rdtsc sub eax, subtime mov subtime, eax // Only the last value of subtime is kept // subtime should now represent the overhead cost of the // MOV and CPUID instructions
...other instructions...
;Test 1: the single inc instruction
cpuid // Serialize execution rdtsc // Read time stamp to EAX mov time_1, eax ;Time for this instruction
inc dword [_running_irq] #Taken from your code
cpuid // Serialize again for time-stamp read rdtsc sub eax, time_1 // Find the difference mov time_1, eax
...other instructions...
#Now time_1 minus subtime should give length of test 1
(As you know the above clobbers eax, ebx, ecx, edx each time cpuid is run so you need to push/pop to protect them if needed.)
If the time is still of the order of 2000 cycles maybe try splitting your inc instruction to
mov esi, _running_irq inc esi mov _running_irq, esi
where esi is used as eax will be trashed. If still as long (unlikely, for the reasons you mentioned) then you could split the measurement points.
Don't know if this helps, but on my 1GHz P3 laptop using just the following code,
RDTSC... mov al,20h out 20h,al RDTSC... produces 450 cycles and
RDTSC... mov al,20h out 20h,al out 0a0h,al RDTSC... produces 900 cycles.
So it would appear as 450nS and 900nS respectively. Since your running at 2.8GHz then 900nS would take 2520 cycles. Serializing didn't seem to make much difference nor running real mode or flat
32bit protected mode. One interesting note was that looping through 5 or more times and taking the final result was 20 cycles slower than just a single run. 470 & 920. Chapter 13, Input/Output, in the IA-32 Intel Architecture Software Developer?s Manual, Volume 1, seems to indicate that "I/O writes to control system hardware cause the hardware to be set to its new state before any other instructions are executed". I'm not a programmer so I may have misunderstood and it's all just coincidence, hopefully someone will put me right. Thanks for giving me some code to play with, I'm sad to say it was fun. :)
Just catching flies...: are you sure that no exception code (trap or whatever, it's a long time since I assembled x86 code) is being called by your inc instruction? Imagine, for instance, an access violation (bad GDT)
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.