x86 High Precision Event Timers support

Hello,

As far as I understand (which is not very far, please do single out all inaccuracies) there is an effort in the x86 world to replace the legacy x86 timer infrastructure:

o The PIT (Programmable Interval Timer) such as Intel's 8253 and 8254

formatting link
formatting link

o The RTC (Real-Time Clock)

o The (Local??) APIC timer (I didn't find much information on this timer.)

o The ACPI timer, also known as the PM clock (Any pointers?)

Microsoft provides a rationale for the new infrastructure:

formatting link

Intel provides a spec:

formatting link

As far as I understand, the HPET hardware is provided by the southbridge chipset? For example, Intel's ICH5.

(Would the VIA VT82C686B provide an HPET block?)

My understanding is that the BIOS is supposed to map the HPET addresses in memory, and provide the information through an ACPI table at boot-time? If the BIOS does not initialize the HPET hardware, the OS remains unaware that it is available.

formatting link

Is there, somewhere, a list of hardware with HPET support?

Are there implementations that support more than 3 comparators?

Regards.

Reply to
Spoon
Loading thread data ...

In comp.os.linux.development.system Spoon wrote in part:

You forgot the venerable and still extremely precise RDTSC instruction available since the original Pentium to read the CPU's cycle counter. Typical overhead, 30 clocks vs interrupt latency of at least 100 clocks.

Accuracy still depends on the clock generator. iAFAIK, nanosleep(), gettimeofday() and friends use RDTSC to interpolate other clocks (APIC prefered over the PIT).

-- Robert

Reply to
Robert Redelmeier

RDTSC is nice as long as you stay away from Geode processors, which seems to enter the SMM in more or less unpredictable ways. Also any processor doing some dynamic clock frequency changes in various power saving modes will cause problems.

The CPU clock frequency is quite temperature dependent. Unless you can check the time at least once a day from some reliable source, such as the CMOS clock, NTP or some GPS clock, quite significant cumulative errors will occur.

Paul

Reply to
Paul Keinanen

Recent Intel CPUs run the RDTSC cyclecounter at a fixed frequency, regardless of temporary reductions in core frequency. Eventually, I suppose AMD will do the right thing too.

--
Mvh./Regards,    Niels Jørgen Kruse,    Vanløse, Denmark
Reply to
Niels Jørgen Kruse

Which reminds me of Rich Brunner's excellent article:

formatting link

I'm playing with the hrtimers infrastructure:

formatting link

I *think* they use HPET, if they find it.

formatting link

I'm also wondering: Are there x86-based systems where a card equipped with several PITs (e.g. ADLINK's PCI-8554) is a necessity?

formatting link

Reply to
Spoon

It would be nice if they get around to supporting a high resolution timing interface that doesn't require a syscall, works in an SMP environment, and supports virtual timing as well as real wall clock timing. It's a known technique and has been around for decades.

Also Intel and AMD need to think about how these things virtualize before they put these kind of things in rather than five years after the fact. But that's only important if Intel and AMD think virtualization is an important part of their business strategy.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

But since an OS or library that provides timing services cannot rely on running on a processor where the RDTSC frequency is fixed, this won't simplify any such OSes or libraries, until at some point it becomes practical to ignore older processors.

--=20 David Hopwood

Reply to
David Hopwood

In comp.os.linux.development.system David Hopwood wrote in part:

This depends very much on the software quality requirements. Not everything is a big system that will be used for critical purposes. Everything is a compomise -- RDTSC is very fast and usually good. OS calls are almost always accurate, but slower and usually less precise.

Horses for courses.

-- Robert

Reply to
Robert Redelmeier

Unless the OS makes good. If the OS fixes these things up in the other cases (hard, I've tried it), then not having to do this on some system is a bonus.

Casper

Reply to
Casper H.S. Dik

Currently, MacOS X can assume that. Granted, Marklar was started before there were fixed frequency RDTSC processors, so there may be some workaround still in there.

--
Mvh./Regards,    Niels Jørgen Kruse,    Vanløse, Denmark
Reply to
Niels Jørgen Kruse

There are two main problems here:

a) The TSC might not run at a fixed frequency, but an OS can know when the changes happen, and still use it to provide a fast return value: It needs a userlevel library routine which just has to take the current TSC count, multiply by the current scale factor (producing a triple-width result), shift down by the current shift value, and add the current base count. Total time for this operation is not much higher than the RDTSC opcode which can easily take 20-30 cycles by itself on some cpus.

Intuitively, you would like to either reset the TSC count or store the current value and subtract it out before the multiplication, but the subtraction can instead be included in the base value to be added in after the scaling multiplication.

The OS must of course update the base value and the scale factor each time the TSC frequency changes, but as long as there's only a small number (two?) of base frequencies to support, the needed scale factors can be calculated up front, and you might even get away with just a shift if the slow frequency is a binary fraction of the high.

b) On a multi-cpu/multi-core system, it is quite possible for the TSC counts to get out of sync, and this is a much harder problem to fix while still delivering sub-us precision and latency.

Windows punts by using the best available external counter, which might fall back all the way to the horrible 1.1 MHz keyboard chip/RAM refresh counter designed into the original 1981 model PC. :-(

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"
Reply to
Terje Mathisen

The number of frequencies can be higher, actually; an typical AMD CPU can only do smallish frequency steps, and that makes for quite a few frequencies (four-five on typical systems around here)

yeah, I didn't do multi-cpu/multi-core; the Opteron multi-core CPUs will all need to run at the same frequency (though I'm not sure if setting the core voltage/frequency of one half of the core affects the other half at the same time or that these actions need to be done in lockstep); multi-socket adds additional challenges.

Ugh.

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Reply to
Casper H.S. Dik

Which one of the problems is that?

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

On a Core Duo, the OS X call "mach_absolute_time()" takes ~132 clocks. With 3 RDTSCs and the triple-width scaling, I suppose that about fits.

If the implementation is the general one, that doesn't rely on a fixed frequency, it could explain why the result is scaled to nanosecond resolution. (A companion call to mach_absolute_time provides a fraction for scaling, so if you want nanosecond resolution, you will be doing a superfluous scaling.) If a fixed frequency was assumed, the raw resolution could have been used in the result, saving a scaling operation.

If Intel could have spared an extra pin, they could have added a proper timebase register incrementing asyncronously on an external timebase signal. At a modest frequency like 33MHz, there should be no problem distributing a timebase signal to multiple CPUs.

--
Mvh./Regards,    Niels Jørgen Kruse,    Vanløse, Denmark
Reply to
Niels Jørgen Kruse

We found that the 10MHz used for this purpose on some SPARC processors is actually not fast enough; that's perhaps several hundred clock ticks which makes using this for precise accounting difficult.

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Reply to
Casper H.S. Dik

The frontside bus clock should be sufficiently synchronous and identical on all CPUs used within one box - at least now. Xeons have a "real" frontside bus, and Opterons have a common hypertransport clock base (200MHz). This frequency scales with processor performance, so it should not be off so far. That's 10-15 cycles resolution on current CPUs, less than the rdtsc instruction takes.

BTW clock skew: Note that for all practical purposes, the only requirement for a distributed timer is that no signal distributes faster than the timer.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Reply to
Bernd Paysan

And another pin to syncronize (reset) all the counters?

I think the problem is that the TSC has two definitions: 1) number of clock ticks, and 2) absolute time that has passed. Unfortunately, TSC is a system level counter. What I would really want is four different counters,

2 for each thread, and 2 for the system. When the OS starts a new thread the counters for that thread would be loaded. The 2 counters are 1 to count clock ticks (so if the processor clock changes, counter rate changes, this is good for getting (somewhat) consistent execution time), and 1 counter that follows real world execution time (wall clock time). This counter, IMO, doesn't need to be completely accurate, say 100 Mhz (10 ns).

It also would be nice if there were compare registers (e.g. MIPS), so that external hardware wasn't needed for timeslicing.

- Tim

NOT speaking for Unisys.

Reply to
Tim McCaffrey

On the Dual-Opteron 270 system we have, the two cores in the same socket always have the same voltage and the same frequency, but the other two in the other socket can be at a different speed.

We have seen some instability on that system, maybe related to speed-changing (the system sometimes crashed when the load (and thus the speed) changed, and this went away when we used a kernel that does not change speeds).

Followups set to comp.arch.

- anton

--
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Reply to
Anton Ertl

Second, as in (b), was my intention. Sorry if I was unclear!

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"
Reply to
Terje Mathisen

Well, I assume you're using something like NTP to keep them "in sync". You can't actually keep them in absolute sync, just within a certain accuracy with a given precision or certainty. You cannot use separate clocks for synchronization like you can with a single clock unless you accept that synchronizing with multiple clocks will occasionally fail and allow erroneous results.

Is the "problem" you can't use multiple clocks to synchronize with or is it something else?

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.