x86 High Precision Event Timers support - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

Stock NTP is only useful at the 10-100 us level unless you have a Pulse
Per Second (PPS) source available to every system. Getting into the ns
domain requires much more heroic efforts, i.e. stuff like replacing the
motherboard crystal with a Rb or Cs atomic clock, and then phase-locking
this setup to UTC with a timing-optimized GPS like the now discontinued
Motorola Oncore UT+.

Even though this gets you a system clock with maybe 10-15 ns RMS offset
from true UTC, you still need a relatively slow syscall to get at it,
unless the OS itself responds to such requests with a user-level library
function that uses RDTSC to extrapolate from the last system clock update.

Quoted text here. Click to load it

Right. You really want the fastest/cheapest possible timing source,
which means TSC on x86 cpus, which also means independent clocks in each
cpu/core.

If the OS can present the illusion of 'a single shared TSC counter', and
do it well enough that no user-level program ever notices, then it would
be a Good Thing (TM).

Terje
--
"almost all programming can be viewed as an exercise in caching"

Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

You shouldn't use multiple clocks as a synchronization mechanism or arbiter.
If you do, either you will get occasional errors or you will incur additional
overhead as the get time code performs synchronization that the user could
have done more efficiently.  E.g. you can implement a much more efficient
getticket() function than using gettimeofday() as a getticket function
could ever be.

Even hardware based solutions such as IBM mainframe TOD clock couldn't
guarantee synchronization when multiple hardware clocks were present.
The architecture guaranteed it but the hardware could not.  For the
64 bit TOD clock, the hardware checked on the bit 32 carry out whether
the clocks were in sync and if not, queued an external interrupt.  This
was approximently once per second, so if the clock drift was bad enough
the clocks could be out of sync enough in that short interval to have
erroneous computation occur.


--
Joe Seigh

When you get lemons, you make lemonade.
We've slightly trimmed the long signature. Click to see the full one.
Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

Sure, and so what?

Programmers would still like to be able to use the fastest/most precise
clock available.

If the cost of providing that is a (very) small chance of sometimes
giving less accurate results, then so be it.

I.e. let's assume I'm using this as a way to pace a sending queue, if I
get a small glitch I might incur an extra lost packet/retransmit, but
that's OK.

Using the same type of best-effort timer to directly control radiation
dosages would be criminal, right?

Terje

--
"almost all programming can be viewed as an exercise in caching"

Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

Yes, you are not using it as a synchronization arbiter.

--
Joe Seigh

When you get lemons, you make lemonade.
We've slightly trimmed the long signature. Click to see the full one.
Re: x86 High Precision Event Timers support

Quoted text here. Click to load it

On a Core Duo, the OS X call "mach_absolute_time()" takes ~132 clocks.
With 3 RDTSCs and the triple-width scaling, I suppose that about fits.

If the implementation is the general one, that doesn't rely on a fixed
frequency, it could explain why the result is scaled to nanosecond
resolution. (A companion call to mach_absolute_time provides a fraction
for scaling, so if you want nanosecond resolution, you will be doing a
superfluous scaling.) If a fixed frequency was assumed, the raw
resolution could have been used in the result, saving a scaling
operation.

Quoted text here. Click to load it

If Intel could have spared an extra pin, they could have added a proper
timebase register incrementing asyncronously on an external timebase
signal. At a modest frequency like 33MHz, there should be no problem
distributing a timebase signal to multiple CPUs.

--
Mvh./Regards,    Niels Jørgen Kruse,    Vanløse, Denmark

Re: x86 High Precision Event Timers support
nospam@ab-katrinedal.dk (=?ISO-8859-1?Q?Niels_JF8%rgen_Kruse?=) writes:

Quoted text here. Click to load it

We found that the 10MHz used for this purpose on some SPARC processors
is actually not fast enough; that's perhaps several hundred clock ticks which
makes using this for precise accounting difficult.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
We've slightly trimmed the long signature. Click to see the full one.
Re: x86 High Precision Event Timers support

Quoted text here. Click to load it

The frontside bus clock should be sufficiently synchronous and identical on
all CPUs used within one box - at least now. Xeons have a "real" frontside
bus, and Opterons have a common hypertransport clock base (200MHz). This
frequency scales with processor performance, so it should not be off so
far. That's 10-15 cycles resolution on current CPUs, less than the rdtsc
instruction takes.

BTW clock skew: Note that for all practical purposes, the only requirement
for a distributed timer is that no signal distributes faster than the
timer.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
We've slightly trimmed the long signature. Click to see the full one.
Re: x86 High Precision Event Timers support
nospam@ab-katrinedal.dk says...
Quoted text here. Click to load it
And another pin to syncronize (reset) all the counters?

I think the problem is that the TSC has two definitions:  1) number of
clock ticks, and 2) absolute time that has passed.  Unfortunately, TSC is a
system level counter.  What I would really want is four different counters,
2 for each thread, and 2 for the system.  When the OS starts a new thread
the counters for that thread would be loaded.  The 2 counters are 1 to
count clock ticks (so if the processor clock changes, counter rate changes,
this is good for getting (somewhat) consistent execution time), and 1
counter that follows real world execution time (wall clock time).  This
counter, IMO, doesn't need to be completely accurate, say 100 Mhz (10 ns).

It also would be nice if there were compare registers (e.g. MIPS), so that
external hardware wasn't needed for timeslicing.

            - Tim

NOT speaking for Unisys.


Re: x86 High Precision Event Timers support
On Fri, 23 Jun 2006 09:54:35 +0200, Terje Mathisen

Quoted text here. Click to load it

At least with previous dual processor boards, the problem was that the
reset was not performed exactly at the same time. So even if each
processor was clocked from the same clock source, you could get
inconsistent timing, if the RDTSC instruction was sometimes executed
on processor 1, while in some cases it was executed on processor 2.

If you can figure out which processor is executing the RDTSC
instruction, this is not a problem. In Windows NT you could set the
thread affinity to a specific processor and execute all RDTSC
instructions in that thread to get consistent timing.
 
Quoted text here. Click to load it

The 1.19 MHz is used only by the QueryPerformanceCounter service on
single processors systems. On multiple processor system, this service
returns the TSC count.

Paul


Re: x86 High Precision Event Timers support
Quoted text here. Click to load it
You use per processor TSC scaling factors and offset corrections.  You
need a count of per thread context switches which you read before and
after reading the cpuid, TSC, scale, and offset.  If the before and after
context switch counts match then everything you've read is for the
same processor.  This is ancient mainframe technology.  I don't know
if Linux has discovered it yet though.  Maybe it's still a problem.


--
Joe Seigh

When you get lemons, you make lemonade.
We've slightly trimmed the long signature. Click to see the full one.
Re: x86 High Precision Event Timers support

Quoted text here. Click to load it

The problem is that this context switch counters are in most operating
system in the kernel mode address space only, thus requiring a switch
from user mode to kernel mode to read the thread context switch count.
This requires validation of parameters etc. and other time consuming
things.

Paul


Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

There's no technical reason this information can't be in user space
as well.  There's may be a problem but it's certainlly not technical
in nature.


--
Joe Seigh

When you get lemons, you make lemonade.
We've slightly trimmed the long signature. Click to see the full one.
Re: x86 High Precision Event Timers support

Quoted text here. Click to load it

Such a contex switch indicator/counter would be an extremely useful
tool for many other purposes as well. Think of restartable algorithms,
particularly about restartable pseudoatomic instruction sequences, as
the TO-lock synchronization primitive.

    Best regards
    Piotr Wyderski

Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

I dispute that.  Once the chips come out of reset, they're both being fed
by the same clock signal.  I don't believe it is possible for them to drift
out of sync.

The old Windows NT used to synchronize the TSCs on multiprocessor systems
(the TSC is a writable register), but Windows XP does not seem to do that,
so the TSCs can start out small integer millions of cycles apart.  But,
once that delta is set, the delta should stay constant.

Quoted text here. Click to load it

Actually, Windows makes one of two choices.  On a uniprocessor machine, it
chooses the motherboard timer chip.  That used to be 1.193MHz, as you say,
although XP now runs it at 3x that frequency.  But on a multiprocessor
machine, it uses the cycle counter.
--
- Tim Roberts, snipped-for-privacy@probo.com
  Providenza & Boekelheide, Inc.

Re: x86 High Precision Event Timers support
Quoted text here. Click to load it

Did you read the AMD paper that someone posted a link to a couple of
days ago?

It goes into specifics describing exactly how this can happen as a
result of frequency throttling, something that can happen independently
on each cpu/core.

Terje
--
"almost all programming can be viewed as an exercise in caching"

Re: x86 High Precision Event Timers support

Quoted text here. Click to load it

I think you are referring to:
http://groups.google.com/group/fa.linux.kernel/msg/c8ae85a08ebd3aa4

Re: x86 High Precision Event Timers support

Quoted text here. Click to load it

Which reminds me of Rich Brunner's excellent article:
http://groups.google.com/group/fa.linux.kernel/msg/c8ae85a08ebd3aa4

Quoted text here. Click to load it

I'm playing with the hrtimers infrastructure:
http://www.tglx.de/hrtimers.html

I *think* they use HPET, if they find it.
http://www.tglx.de/projects/hrtimers/2.6.16/patch-2.6.16-hrt6.patch


I'm also wondering: Are there x86-based systems where a card equipped
with several PITs (e.g. ADLINK's PCI-8554) is a necessity?

http://www.adlinktech.com/PD/web/PD_detail.php?pid27 %

Re: x86 High Precision Event Timers support
[ Followup-To: set to comp.arch, feel free to disagree ]


Quoted text here. Click to load it

Could anyone comment?

When is one Programmable Interval Timer not enough?

Re: x86 High Precision Event Timers support
Concerning this I've been always wondering why desktop- and server-CPUs like
the x86s don't generally support a single timer that counts down to zero from
a given value at core-frequency (on CPUs with clock-stepping at the highest
frequency of course) and then generate an interrupt.
Basing on this, the CPU could do everything from scheduling to multi-media
-timers: Items being scheduled are dispatched to a queue and the interrupt
is only generated for the first item in the queue.
Schedulers could be improved f.e. by such a timer: Basing on a minimum sched-
uling-frequency, this frequency could be raised if the number of proceses or
thrads or threads raises above a certain limit. Or scheduling-slices could
have differtent lengths basing on the priority of the thread or process.

Site Timeline