x86 High Precision Event Timers support

Spoon · 2006-06-21T09:22:37+00:00

Hello,As far as I understand (which is not very far, please do single out all inaccuracies) there is an effort in the x86 world to replace the legacy x86 timer infrastructure:o The PIT (Programmable Interval Timer) such as Intel's 8253 and 8254o The RTC (Real-Time Clock)o The (Local??) APIC timer(I didn't find much information on this timer.)o The ACPI timer, also known as the PM clock(Any pointers?)Microsoft provides a rationale for the new infrastructure:Intel provides a spec:As far as I understand, the HPET hardware is provided by the southbridge chipset? For example, Intel's ICH5.(Would the VIA VT82C686B provide an HPET block?)My understanding is that the BIOS is supposed to map the HPET addresses in memory, and provide the information through an ACPI table at boot-time? If the BIOS does not initialize the HPET hardware, the OS remains unaware that it is available.Is there, somewhere, a list of hardware with HPET support?Are there implementations that support more than 3 comparators?Regards.

P

Paul Keinanen 20 years ago

At least with previous dual processor boards, the problem was that the reset was not performed exactly at the same time. So even if each processor was clocked from the same clock source, you could get inconsistent timing, if the RDTSC instruction was sometimes executed on processor 1, while in some cases it was executed on processor 2.

If you can figure out which processor is executing the RDTSC instruction, this is not a problem. In Windows NT you could set the thread affinity to a specific processor and execute all RDTSC instructions in that thread to get consistent timing.

The 1.19 MHz is used only by the QueryPerformanceCounter service on single processors systems. On multiple processor system, this service returns the TSC count.

Paul

Vote

J

Joe Seigh 20 years ago

You use per processor TSC scaling factors and offset corrections. You need a count of per thread context switches which you read before and after reading the cpuid, TSC, scale, and offset. If the before and after context switch counts match then everything you've read is for the same processor. This is ancient mainframe technology. I don't know if Linux has discovered it yet though. Maybe it's still a problem.

Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.

Vote

T

Tim Roberts 20 years ago

I dispute that. Once the chips come out of reset, they're both being fed by the same clock signal. I don't believe it is possible for them to drift out of sync.

The old Windows NT used to synchronize the TSCs on multiprocessor systems (the TSC is a writable register), but Windows XP does not seem to do that, so the TSCs can start out small integer millions of cycles apart. But, once that delta is set, the delta should stay constant.

Actually, Windows makes one of two choices. On a uniprocessor machine, it chooses the motherboard timer chip. That used to be 1.193MHz, as you say, although XP now runs it at 3x that frequency. But on a multiprocessor machine, it uses the cycle counter.

- Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc.

Vote

P

Paul Keinanen 20 years ago

The problem is that this context switch counters are in most operating system in the kernel mode address space only, thus requiring a switch from user mode to kernel mode to read the thread context switch count. This requires validation of parameters etc. and other time consuming things.

Paul

Vote

J

Joe Seigh 20 years ago

There's no technical reason this information can't be in user space as well. There's may be a problem but it's certainlly not technical in nature.

Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.

Vote

T

Terje Mathisen 20 years ago

Stock NTP is only useful at the 10-100 us level unless you have a Pulse Per Second (PPS) source available to every system. Getting into the ns domain requires much more heroic efforts, i.e. stuff like replacing the motherboard crystal with a Rb or Cs atomic clock, and then phase-locking this setup to UTC with a timing-optimized GPS like the now discontinued Motorola Oncore UT+.

Even though this gets you a system clock with maybe 10-15 ns RMS offset from true UTC, you still need a relatively slow syscall to get at it, unless the OS itself responds to such requests with a user-level library function that uses RDTSC to extrapolate from the last system clock update.

Right. You really want the fastest/cheapest possible timing source, which means TSC on x86 cpus, which also means independent clocks in each cpu/core.

If the OS can present the illusion of 'a single shared TSC counter', and do it well enough that no user-level program ever notices, then it would be a Good Thing (TM).

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

T

Terje Mathisen 20 years ago

Did you read the AMD paper that someone posted a link to a couple of days ago?

It goes into specifics describing exactly how this can happen as a result of frequency throttling, something that can happen independently on each cpu/core.

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

D

David Hopwood 20 years ago

wrote in part:

Yes, a lot of software is of very poor quality ;-)

David Hopwood

Vote

T

Torbjorn Lindgren 20 years ago

As I understand it this is true of Opteron but NOT AMD Athlon64 X2, as I understand it the X2 allows the cores can be controlled individually (presumably this is to get down power usage on desktops).

At least that's the explaination I've heard for why you can see some really funky effects in Windows with X2's (but not Opterons!) unless you install a new enough "AMD Athlon 64 X2 Dual Core Processor Driver" (available from AMD but not Windows Update for some reason).

Linux also had problem with X2's and CnQ for a while, probably because a lot of this was tested on Opteron... IIRC people produced test programs which showed that this effect was real before the patch was accepted.

There's some rumors that a future X2 revision is going to run with the same TSC for all cores in a physical package/socket (they should have plenty of stable clocks to run it off, perhaps the HT clock).

Vote

J

Joe Seigh 20 years ago

You shouldn't use multiple clocks as a synchronization mechanism or arbiter. If you do, either you will get occasional errors or you will incur additional overhead as the get time code performs synchronization that the user could have done more efficiently. E.g. you can implement a much more efficient getticket() function than using gettimeofday() as a getticket function could ever be.

Even hardware based solutions such as IBM mainframe TOD clock couldn't guarantee synchronization when multiple hardware clocks were present. The architecture guaranteed it but the hardware could not. For the

64 bit TOD clock, the hardware checked on the bit 32 carry out whether the clocks were in sync and if not, queued an external interrupt. This was approximently once per second, so if the clock drift was bad enough the clocks could be out of sync enough in that short interval to have erroneous computation occur.

Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.

Vote

E

Elcaro Nosille 20 years ago

Concerning this I've been always wondering why desktop- and server-CPUs like the x86s don't generally support a single timer that counts down to zero from a given value at core-frequency (on CPUs with clock-stepping at the highest frequency of course) and then generate an interrupt. Basing on this, the CPU could do everything from scheduling to multi-media

-timers: Items being scheduled are dispatched to a queue and the interrupt is only generated for the first item in the queue. Schedulers could be improved f.e. by such a timer: Basing on a minimum sched- uling-frequency, this frequency could be raised if the number of proceses or thrads or threads raises above a certain limit. Or scheduling-slices could have differtent lengths basing on the priority of the thread or process.

Vote

P

Phil Armstrong 20 years ago

My understanding is that the Athlon64 X2s still run the two cores at the same frequency. It's the laptop parts which can run the cores at different frequencies.

Phil

http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Vote

C

Casper H.S. Dik 20 years ago

My understanding is that the Athlon64 X2 and the Opteron parts are basically the same.

(The latest socket 939 Opteron and the Athlon64 X2 cannot be told apart by software; they return the same CPUID values.

Casper

Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. Statements on Sun products included here are not gospel and may be fiction rather than truth.

Vote

R

Robert Redelmeier 20 years ago

Agreed. Sometimes due to a misguided effort at high quality!

-- Robert

>

Vote

S

Spoon 20 years ago

Could anyone comment?

When is one Programmable Interval Timer not enough?

Vote

S

Spoon 20 years ago

I think you are referring to:

formatting link

Vote

T

Terje Mathisen 20 years ago

Sure, and so what?

Programmers would still like to be able to use the fastest/most precise clock available.

If the cost of providing that is a (very) small chance of sometimes giving less accurate results, then so be it.

I.e. let's assume I'm using this as a way to pace a sending queue, if I get a small glitch I might incur an extra lost packet/retransmit, but that's OK.

Using the same type of best-effort timer to directly control radiation dosages would be criminal, right?

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

J

Joe Seigh 20 years ago

Yes, you are not using it as a synchronization arbiter.

Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.

Vote

P

Piotr Wyderski 20 years ago

Such a contex switch indicator/counter would be an extremely useful tool for many other purposes as well. Think of restartable algorithms, particularly about restartable pseudoatomic instruction sequences, as the TO-lock synchronization primitive.

Best regards Piotr Wyderski

Vote

x86 High Precision Event Timers support

Join the Discussion

Didn't find your answer?