x86 High Precision Event Timers support

At least with previous dual processor boards, the problem was that the reset was not performed exactly at the same time. So even if each processor was clocked from the same clock source, you could get inconsistent timing, if the RDTSC instruction was sometimes executed on processor 1, while in some cases it was executed on processor 2.

If you can figure out which processor is executing the RDTSC instruction, this is not a problem. In Windows NT you could set the thread affinity to a specific processor and execute all RDTSC instructions in that thread to get consistent timing.

The 1.19 MHz is used only by the QueryPerformanceCounter service on single processors systems. On multiple processor system, this service returns the TSC count.

Paul

Reply to
Paul Keinanen
Loading thread data ...

You use per processor TSC scaling factors and offset corrections. You need a count of per thread context switches which you read before and after reading the cpuid, TSC, scale, and offset. If the before and after context switch counts match then everything you've read is for the same processor. This is ancient mainframe technology. I don't know if Linux has discovered it yet though. Maybe it's still a problem.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

I dispute that. Once the chips come out of reset, they're both being fed by the same clock signal. I don't believe it is possible for them to drift out of sync.

The old Windows NT used to synchronize the TSCs on multiprocessor systems (the TSC is a writable register), but Windows XP does not seem to do that, so the TSCs can start out small integer millions of cycles apart. But, once that delta is set, the delta should stay constant.

Actually, Windows makes one of two choices. On a uniprocessor machine, it chooses the motherboard timer chip. That used to be 1.193MHz, as you say, although XP now runs it at 3x that frequency. But on a multiprocessor machine, it uses the cycle counter.

--
- Tim Roberts, timr@probo.com
  Providenza & Boekelheide, Inc.
Reply to
Tim Roberts

The problem is that this context switch counters are in most operating system in the kernel mode address space only, thus requiring a switch from user mode to kernel mode to read the thread context switch count. This requires validation of parameters etc. and other time consuming things.

Paul

Reply to
Paul Keinanen

There's no technical reason this information can't be in user space as well. There's may be a problem but it's certainlly not technical in nature.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

Stock NTP is only useful at the 10-100 us level unless you have a Pulse Per Second (PPS) source available to every system. Getting into the ns domain requires much more heroic efforts, i.e. stuff like replacing the motherboard crystal with a Rb or Cs atomic clock, and then phase-locking this setup to UTC with a timing-optimized GPS like the now discontinued Motorola Oncore UT+.

Even though this gets you a system clock with maybe 10-15 ns RMS offset from true UTC, you still need a relatively slow syscall to get at it, unless the OS itself responds to such requests with a user-level library function that uses RDTSC to extrapolate from the last system clock update.

Right. You really want the fastest/cheapest possible timing source, which means TSC on x86 cpus, which also means independent clocks in each cpu/core.

If the OS can present the illusion of 'a single shared TSC counter', and do it well enough that no user-level program ever notices, then it would be a Good Thing (TM).

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"
Reply to
Terje Mathisen

Did you read the AMD paper that someone posted a link to a couple of days ago?

It goes into specifics describing exactly how this can happen as a result of frequency throttling, something that can happen independently on each cpu/core.

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"
Reply to
Terje Mathisen

wrote in part:

Yes, a lot of software is of very poor quality ;-)

--
David Hopwood
Reply to
David Hopwood

As I understand it this is true of Opteron but NOT AMD Athlon64 X2, as I understand it the X2 allows the cores can be controlled individually (presumably this is to get down power usage on desktops).

At least that's the explaination I've heard for why you can see some really funky effects in Windows with X2's (but not Opterons!) unless you install a new enough "AMD Athlon 64 X2 Dual Core Processor Driver" (available from AMD but not Windows Update for some reason).

Linux also had problem with X2's and CnQ for a while, probably because a lot of this was tested on Opteron... IIRC people produced test programs which showed that this effect was real before the patch was accepted.

There's some rumors that a future X2 revision is going to run with the same TSC for all cores in a physical package/socket (they should have plenty of stable clocks to run it off, perhaps the HT clock).

Reply to
Torbjorn Lindgren

You shouldn't use multiple clocks as a synchronization mechanism or arbiter. If you do, either you will get occasional errors or you will incur additional overhead as the get time code performs synchronization that the user could have done more efficiently. E.g. you can implement a much more efficient getticket() function than using gettimeofday() as a getticket function could ever be.

Even hardware based solutions such as IBM mainframe TOD clock couldn't guarantee synchronization when multiple hardware clocks were present. The architecture guaranteed it but the hardware could not. For the

64 bit TOD clock, the hardware checked on the bit 32 carry out whether the clocks were in sync and if not, queued an external interrupt. This was approximently once per second, so if the clock drift was bad enough the clocks could be out of sync enough in that short interval to have erroneous computation occur.
--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

Concerning this I've been always wondering why desktop- and server-CPUs like the x86s don't generally support a single timer that counts down to zero from a given value at core-frequency (on CPUs with clock-stepping at the highest frequency of course) and then generate an interrupt. Basing on this, the CPU could do everything from scheduling to multi-media

-timers: Items being scheduled are dispatched to a queue and the interrupt is only generated for the first item in the queue. Schedulers could be improved f.e. by such a timer: Basing on a minimum sched- uling-frequency, this frequency could be raised if the number of proceses or thrads or threads raises above a certain limit. Or scheduling-slices could have differtent lengths basing on the priority of the thread or process.

Reply to
Elcaro Nosille

My understanding is that the Athlon64 X2s still run the two cores at the same frequency. It's the laptop parts which can run the cores at different frequencies.

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
Reply to
Phil Armstrong

My understanding is that the Athlon64 X2 and the Opteron parts are basically the same.

(The latest socket 939 Opteron and the Athlon64 X2 cannot be told apart by software; they return the same CPUID values.

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Reply to
Casper H.S. Dik

Agreed. Sometimes due to a misguided effort at high quality!

-- Robert

>
Reply to
Robert Redelmeier

Could anyone comment?

When is one Programmable Interval Timer not enough?

Reply to
Spoon

I think you are referring to:

formatting link

Reply to
Spoon

Sure, and so what?

Programmers would still like to be able to use the fastest/most precise clock available.

If the cost of providing that is a (very) small chance of sometimes giving less accurate results, then so be it.

I.e. let's assume I'm using this as a way to pace a sending queue, if I get a small glitch I might incur an extra lost packet/retransmit, but that's OK.

Using the same type of best-effort timer to directly control radiation dosages would be criminal, right?

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"
Reply to
Terje Mathisen

Yes, you are not using it as a synchronization arbiter.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Reply to
Joe Seigh

Such a contex switch indicator/counter would be an extremely useful tool for many other purposes as well. Think of restartable algorithms, particularly about restartable pseudoatomic instruction sequences, as the TO-lock synchronization primitive.

Best regards Piotr Wyderski

Reply to
Piotr Wyderski

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.