"Efficient" timer implementations

- D
- D Yuniskis
  
  Contact options for registered users
posted
13 years ago

Mon, Dec 6, 2010 8:18 PM

Hi,

In my latest RTOS, I have timers on every service (i.e., every service invocation can return ERROR_TIMEOUT in addition to SUCCESS or FAILURE). But, the common approaches to providing timers (not just timeOUTs) don't "feel right" when applied consistently like this. I'm wondering if there is something I can exploit to "cut a corner" in my implementation...

E.g., allocating a "pool" of timers requires careful sizing of that pool -- lest a service end up having to support a NO_TIMER_AVAILABLE error code (frown). OTOH, dynamic allocation (potentially) ends up with

*lots* of timers having to be managed (overhead).

Consider, most timers will (presumably) NOT expire in normal use so the cost of maintaining them should be low -- foolish to allocate lots of resources to something that you hope not to "use".

The idea of a "Timer per Service" doesn't work because a service can be multithreaded (in "reality" or in "effect"). I.e., that would serialize all accesses to the service.

A "Timer per Thread" is a potential solution -- *if* you also allow "general purpose" timers to coexist with that "service timer". (i.e., a thread could want to start a general purpose timer, *then* invoke a service with another "timeout" WITHOUT having to explicitly share that "general purpose" timer with the service provider).

[N.B. this is an option but adds a lot of complexity to timer management -- and would limit the number of such general purpose timers that the thread could create] [Also, note that this could complicate the implementation of *layered* services]

I could potentially trade away the ability to infer

*elapsed* time from the examination of a (running) timer -- but, it is imperative that a timer be an independant entity in its own right (e.g., so you can pass the "time remaining" from one *completed* service call to the *next* service called).

Or, is this just one of those areas where there is no "clever" solution?

Thx,

--don

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Dec 7, 2010 6:05 AM

I don't quite understand your constraints. Is it the overhead of managing the very large set of timers? The usual priority queue is O(log(n)) in the number of timers in inserting, deleting and firing timers.

An alternative, if you have a fixed tick (say 1ms), is to use a series of groups of exponentially increasing buckets for timers. For example, have 128 buckets for timers at 1ms resolution, then 128 at

64ms resolution, 128 at 4096ms... Structure each tier as a circular list of buckets. Insert a new timer in the linked list for the best bucket (IOW, if the new timer expires in the next 128ms, insert into the appropriate 1ms bucket, if it expires in more than that, but less than 8192ms, use the 64ms buckets). Every tick you just fire all the timers in the next 1ms bucket. Every 64ms redistribute the timers in the next 64ms bucket into the 1ms buckets, repeat every 4096ms for the next bucket in the third (4096ms) tier, etc.

Basically that does one linked list insertion and deletion for timers firing in the next 128ms, 2 for those in the next 8192ms, 3 for 524s, etc. A handful of tiers gets you to timers as long as you practically need (for the above parameters, seven tiers gets you to 278 years).

The simple design has one flaw, in that you get big bursts of redistribution work at the 2**6, 2**12, 2**18... ticks, but you can distribute that without too much added complexity by redistributing on each tick, but only ceil(1/64th) (etc.) of the contents of the bucked being redistributed.

Obviously the length of the clock tick and the number of buckets in each tier can be varied (although reducing the size of the tiers increases the number of redistributions done over the life of a timer).

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Dec 7, 2010 9:53 AM

This is similar to my current implementation -- however, I don't rigidly define the "intervals" in each "bucket".

E.g., currently, a thread defines a Timer, then passes that timer to a service invocation and blocks on the result. If the service completes successfully, you can chose to examine the Timer (i.e., to determine how long you have blocked), can discard it or can pass the "time remaining" on to some subsequent service invocation (typically some OTHER service). If the service failed due to a TIMEOUT, you implicitly know that the Timer has "0" left on it. If the service failed for other reasons, then you address those problems.

[Note that the timer presently exists in user space!]

The RTOS keeps all "running" Timers on a queue sorted by expiration time. It separately manages a (hidden) list of pointers *into* that queue to expedite insertion, etc. The size of this "list of pointers" varies as a function of the number of active Timers. This lets the maximum "work" performed for an insertion be bounded by limiting the length of the longest (sub)queue that has to be parsed.

E.g., if there are 45 active timers, I could opt to mark the list of 45 timers in 4 groups of ~12 timers, each -- note that the time interval represented in each such "subqueue" varies based on the times present *on* the timers in that "subqueue". If the number of timers suddenly increases to

136, these 4 groups would mean the length of each "subqueue" has now increased to ~35 from that ~12 -- and the average time for parsing a queue would increase threefold. In this case, the OS can lengthen the "hidden list"... maybe opting for 10 groups of timers.

Maintaining these lists incrementally is relatively inexpensive -- if a timer is deleted/inserted, at most one timer shifts into/out of each group (i.e., the whole group need not be reexamined/reshuffled).

I opted for the "cut into N timers" instead of "N intervals" as most timers will tend to be similar time intervals. I.e., there won't be any in the "278 year" tier. :> And, "steady state", threads will keep recreating the same timers over and over again as they keep accessing the same *services* over and over again, etc. [this is actually a falsehood but the details are unimportant, here]

I store only deltas in each "timer" so scheduling the next "timer expiration" is easy -- just look at the delta in the timer at the head of the queue. OTOH, finding the expiration time (i.e., "time remaining") of the Nth timer in the queue requires examining all of the timers expiring *before* it. [This shows the bias towards short intervals and the fact that most timers will be used as timeOUTs.]

In my scheme, you have to do a "redistribution" for each G timer "inserts + deletes" (where G is the number of groups in the hidden list).

Having said all that... :>

The bigger question: is there a way of bounding the number of timers so that I can determine implementation costs at design time instead of run-time? e.g., my suggestion of a "Service Timer" (per thread) seems like this does that (ASSUMING ALL SERVICES ARE SYNCHRONOUS!). However, I think it would require each service to execute on the invoking thread's stack -- or, force constraints on the services regarding what they can use, etc. (or, otherwise constrain the design of the service e.g., thread per client)

[as I said before, this doesn't address the "general purpose timers"]

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Dec 7, 2010 10:02 AM

You could make all the threads responsible for managing their own memory resources. Assuming that you're working in C, a timer object could be a struct that includes the linked-list pointers. For example:

struct timer { struct timer *prev; struct timer *next; int period; // whatever else you need for the timer object };

In the thread, you then have the option of allocating timers statically, dynamically, or even on the stack:

f( ) { struct timer x;

init_timer( &x, ... ); wait_timer( &x ); del_timer( &x ); }

The RTOS kernel would then just use the prev, next pointers of the timer struct, so you can avoid any additional memory allocation in the kernel.

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Dec 7, 2010 5:44 PM

y

The scheme with the fixed buckets has a couple of advantages. First, there's no real computation done at the tick - you just grab the linked list of timers in the next 1ms bucket (and the list heads are a circular structure, so there's never any allocation of new list heads), fire them all, and you're done. A timer that's abandoned before being fired (as you'd expect for most timeouts) is just a deletion from its current linked list. Second, the cost of managing the timers is basically fixed based on the interval set, and is basically always a small integer number of liked list insertions and deletions (no more than seven for the aforementioned 278 years). The only time you compute the time is at the initial insertion of the timer or at one of its redistributions (FWIW, the redistributions are marginally simpler, since you don't have to decide which tier you're going to insert into - always the next finer one). The number of timers in the system does not impact the cost per timer at all.

If many of yours timer are timeouts which do not need fine resolution and wide range, you could have a second set of buckets at (say) 1s resolution, and enough of them to cover almost all of your timeouts, and then you don't even do any redistributions, although they'll fire in bigger clumps. OTOH, if most timers do not actually fire, some judicious picking of tier sizes can ensure the same thing - in the above, for example, if most timers are between 8s and 262s, and are abandoned at least 8192ms before they expire, they're never redistributed (and one that do live long enough to fire, would be redistributed twice).

You could embed a timer object in each object that might need one. They're not necessarily very big. A pair of linked list pointers, a pointer to the fire routine, a word of user data and a ~64 bit time. So ~24 bytes on a 32 bit system. Obviously you might have timers unrelated to a specific object, and embedding them somewhat assumes a single address space OS. Of course that just moves the problem elsewhere, but presumably you=92re having to deal with the allocation of those other objects anyway.

- M
- Marc Jet
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Dec 7, 2010 7:38 PM

In my own embedded OS, which evolved over many years and several platforms, I found the following the most simple yet flexible approach:

Depending on the hardware, I identify the smallest time granularity. This could be CPU cycles for platforms with a cycle counter, or the timer register readable state (after prescaler). This I define my software timer granularity, and choose a type big enough to last "forever". 64 bits have been enough for my projects so far, and fortunately a 64 bit "unsigned long long" is supported by most compilers for register passing and function result. This type I define as "tim64".

The HAL provides functions timGet() which assembles the current time as tim64 (sometimes just from hardware registers, sometimes through race resolution between hardware timers and ISR software state). Note that this is not wallclock time, but just an ever increasing value that starts at 0LL during power up. There are also functions to add and substract NS, US, MS and seconds, which implicitly contain the conversion between hardware cycles and human readable time scales.

Based on this, I create a "polled" API. It is basically timGet() + timElapsed() for knowing how much time has passed since some moment. And timSet(ms) + timIsOver() for polled loops (timSet consists of timGet + timAdd).

For asynchronous waiting, my world consists of signal producers and signal sensors. A timer can produce a signal (when time is over), and the application can sleep until its sensor receives such a signal. My timer signal is a small struct which combines the standard signal sensor and the tim64 target time. Note that this is an extra API, layered on top of the polled one.

The API provides timSigCreate() to create a signal, in memory provided by the application (usually an autovar). With timSigAttach() it is queued into a sorted list of active timers, and timSigDetach() reclaims ownership of the memory by cancelling/removing from the list.

The hardware ISR reprograms a hardware timer to trigger when the next queued timer signal wants to be awakened, and signals the corresponding tasks as their events happen. timSigAttach() and timSigDetach() resolve the potential races, and reprogram the hardware timer too. Often, the hardware timer has a different granularity than tim64, so conversion is necessary here, too.

The signal sensor stuff is common over many OS services, and a task can wait on multiple signal sensors. This enables the timer signals to be used as timeouts for services that are themselves asynchonous.

The philosophy of this is to push the storage burden to the application that requests a service. This helps scaling a lot, and there is almost no "unjustified" overhead.

For the polled API just CPU registers or an autovar is enough. Time handling can often consist of just a few hardware register reads with no or little conversion. By not using the other API, a statically linked image can be tiny.

On the other hand, there is absolutely no limit for asynchronous timers. Yet they can still be efficiently mapped into the .bss section. They can live as autovar in a "function with timeout", or as member of a handle that your service provides. There is no dynamic memory allocation involved (unless you do), and no "out of memory" handling.

I found that this approach is so much lighter on resources, and outweights the disadvantages (which is the memory ownership transfer in timSigAttach/Detach and therefore required programming discipline).

Let me know (here) when you found this useful.

Best regards

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Dec 9, 2010 7:38 PM

I *used* to use a similar approach -- though much less fine grained (i.e., I *set* the granularity of my "time" to the finest that I would need -- almost always in bogounits). "Time" was always an integer (of varying size), etc. Similarly, "forever" was simply the longest time in which I was interested -- for "system" time (ignore wall clock for this discussion).

In *this* application (media clients), I have no need for "system time". All time is timeouts and (elapsed) timers. I.e., nothing ever happens *at* a particular time but, rather, some time "from now" (so, the whole "set time", "wait for time", etc. API goes away -- along with the problems it brings).

I have several "clocks" (cf. Mach) that can be used to drive timers. Most are virtual clocks derived from real-time "events". The precision of these varies as appropriate to their source. E.g., if a DHCP client wants to wait 30 seconds for a timeout, it *probably* doesn't care if that is 30.0000000000 seconds,

29.9099325 seconds or "31" seconds. (i.e., that is how I apply the various clocks).

OTOH, the audio streamer very much cares how precisely *this* next 44KHz sample is pushed out to the D/AC. At the same time, it probably has no interest in anything as far distant as "30 seconds hence".

The application is carefully structured so that individual aspects of it need not be concerned with more than one sense of time. This eliminates/minimizes synchronization and conversion errors.

Time is *not* an integral data type, however (despite being implemented as an "integer"). I used fixed point representations as they allow me to adjust the various FLL's and PLL's more sanely (simplifies the math). This allows me to keep the clock(s) in question syntonous with those on other clients while simultaneously giving me a lot of flexibility on phase control (it's just an offset in the math).

As a result of all this, I can shrink "time"s to 32b datatypes (I am struggling to get them down to 16 -- probably an unnecessary optimization). This is effectively "forever" in terms of an of the application components' usage of them (e.g., 2G seconds?

2G milliseconds? 2G video frames? etc.)

Again, I have no need for "human readable" time units. Just like an engine controller cares little about "seconds" or "time of day" but, rather, how much BTDC it has to fire the injectors and detonate the plug. Bogounits prevail.

Do you put the Scheduler into the same set of signal consumers? (that would be cool as it would let scheduling be treated as just another "thread")

My (current) API provides a "create timer" hook. The user can optionally provide a pointer to a "timer struct" that it would like used for the timer's representation. However, the system can ignore this -- regardless, a handle for the timer is returned (if created). This lets me keep the timer "precious" if I want to (the API is common to a variety of different RTOS's I've designed over the years -- it's just a handy abstraction). It also lets me change the representation of the timer without the application being concerned (or bothered!) by that change.

My timeouts live outside the scope of the function (service) invoked. E.g., I can do:

Timer foo; Timer handle = Create_Timer(&foo, SOME_AMOUNT_OF_TIME, ...);

if (TIMEOUT == Get_IP_Address(&foo, ...)) { /* didn't get IP address in time available; bad DHCP server? */ };

if (TIMEOUT == Get_Program_Image(&foo, IPaddr, ...)) { /* didn't get program image in remaining time available; die */ }

if (TIMEOUT == Initialize_System(&foo, ...)) { /* resources not available in time remaining; gasp! */ }

log("startup completed with %d remaining", Get_Remaining(&foo));

If I wrap this in a routine (e.g., "System_Startup"), then foo could be autovar (if the RTOS complies) -- assuming the rest of the system cares nothing about foo outside of this scope. Furthermore, if System_Startup had some tear-down times associated with it and was, itself, constrained by an imported timer, then it could deduct those times from the timer prior to invoking the constituent services so that a failure would leave it with sufficient time remaining to complete its tear-down actions and still return to its caller within the prescribed "time limit".

[this is intended as an alternative to supporting deadline handlers in the RTOS itself]

It still seems to avoid the (to me) obvious chances for optimization/exploitation -- e.g., these timeouts aren't "general purpose timers" that can be used to initiate events. They are *always* used in the same way -- to unblock a thread waiting on some service/resource. If all services are synchronous, then a thread can ONLY be blocking on one at a time. I.e., there is only need for one "service timer" per thread. And, that the use of that timer is preordained.

I think I'll see what the cost of including it as part of the thread state and an implicit part of the scheduler gives me in terms of performance and cost.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Dec 9, 2010 7:45 PM

Yes, see the description of my "Create_Timer" API (in a reply to Mark Jet). I allow threads to *try* to manage the memory associated with the timer(s). (the RTOS reserves the right of managing them internally, if it chooses). But, doing so,

*implicitly* makes all timers the same (i.e., it precludes any optimization based on the narrowly defined use expected of these "service timers"). It's "just another timer" that never uses the full capabilities of a "general purpose timer".

I'm looking for "tricks" that are inherent in this particular type of usage that can be exploited for performance/space gains. E.g., I'm competing with using deadline handlers *in* the RTOS. Obviously, I am hoping to come up with a "cheaper" solution. (for some definition of "cheap").

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Dec 9, 2010 8:41 PM

(sigh) I don't think I'm following you. :-/

Hopefully restating your environment:

- the tick happens at 1ms intervals

- all timers for a given "duration" (blech) are linked into a "duration-specific" list (i.e., any timer set for 3ms is linked into the "3ms list" -- this is just an efficiency hack to make it easy to grab *all* of the timers expiring at a given "instant"/tick, right?) You call each such list a "bucket" (??)

- these lists are grouped into "tiers": the first tier spans the interval (0 - 128ms) and contains 128 lists (i.e., a list of 1ms timers, a list of 2ms timers, a list of 3ms timers, etc.); the second tier spans the interval (128 - 8192ms) and also contains 128 lists (i.e., the list of timers expiring in 128-191ms, the list of 192-235ms, etc.); the third tier spans (8192- .... [I hope my arithmetic is correct at this hour of the morning]

- within a list, the individual timers are sorted by duration. For the lists in the finest grained tier, no sorting is necessary as each timer has the *same* duration (i.e., N ms). For higher tiers, the durations may vary due to a coarser grading (e.g., the 128ms list contains timers expiring in 128, 129, 130... 191ms) [sorting could be donw at insertion or postponed to "redistribution"]

Is this roughly correct? (I haven't checked the actual tiers; your values change between, e.g., 8192 and 4096 later)

It seems like all the "tiers" buys you is convenient places to peek into what would otherwise be a single (sorted) list of timer expirations (?). If so, it loses its appeal when the range of timeout *values* is reduced -- yet the number of timers remains high.

E.g., if I'm seeing *lots* of timers at a *few* different "values", then those might all end up in a single "tier". Or, a "few" tiers. So, the improvement the tiers give you is marginal compared to just having timers in "identical valued" lists. (?)

I, for example, just have a single list with "markers" at particular points *into* the list (to cut search time). For expiration, I just pull timers off the head of the list having "0" time on them until I find one that has a nonzero (positive) "delta time". That time defines the next time that the "timers" will need to be examined (i.e., nothing expires now and that "delta time").

If all equal valued timers were placed in explicit lists, then I wouldn't have to examine subsequent timers in my

*single* list -- I could just pull *the* list of "expiring timers" off the head of the queue.

But, I would then need to walk through *that* list so I still have to process each individual timer... :-/

Timers are finer grained than that. I'm processing audio/video so ms and fractional ms are typical event times (though I am processing things in large bunches, etc.). "seconds" are "an eternity" :>

Yes, that's my idea of creating a "service timer" for each thread. With synchronous services, you would only ever need *one* such service timer. And, it could "belong" to that thread. Since you *know* how it will be used, you could exploit that knowledge in your design of the scheduler, etc. instead of treating it as "yet another timer".

Size, as always, is relative. :> I am trying to prune this application down to "nothing" to make it's application ubiquitous. E.g., we've gone from "speakers" to "powered speakers"... I want to go to "smart speaker" (plug ethernet cable in back and you're listening to "whatever"). Ditto for video.

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Dec 9, 2010 10:14 PM

n"]

No. Each bucket contains timer expiring at a certain time, nothing to do with their durations.

So the first tier is a set of 128 buckets, the first of which holds timers expiring 1ms from now (IOW, at the next tick), the second 2ms from now (second tick), etc. You'd obviously implement that as a circular structure, and inserting a timer with an expiration 5ms in the future would add that timer to the linked list in the fifth bucket from the current start. Then at each tick you just grab the next bucket, dispatch all those timers, and step to the next position in the circular structure.

The obvious problem with that is that in general it's impractical to have enough buckets to cover all the possible timer expirations (you=92d need 3.6 million buckets to cover an hour). So instead you have addition tiers of buckets at exponentially lower resolutions. So if your adding a timer firing in 500ms, which doesn't fit in the 128 1ms buckets of the first tier, you add it to the second tier, in the seventh bucket. The second tier is composed of 64ms buckets, the first of which contains timers that will expire at least 64ms from now. Periodically (every 64ms), you take the timers in the first bucket in the second tier (which will be expiring in 64-127ms), and redistribute them in to the first (1ms) tier, from where they eventually get dispatched. The second tier is also a circular structure, and gets stepped around after the redistribution.

Repeat with additional coarser grained tiers as needed.

So to set a timer expiring 10,000ms from now, you'd add it to the second bucket of the third tier. Somewhere between 4096 and 8191 ticks before that timer is due to fire, the first bucket in the third tier (now containing that timer) will be redistributed to the second tier. Let's say the redistribution happens 5000ms before expiration, then the timer would end up in the 78th bucket in the second tier. Later, somewhere between 64 and 127ms before firing, the first bucket in the second tier (which now contains the timer in question) gets redistributed to the first tier. Assuming my arithmetic is correct, and the redistributions are synced on the 64th/4096th/etc. ticks, the timer would be moved to the 72nd bucket in the first tier. 72 ticks later, the timer would be dispatched.

As each "bucket" is just the head of a doubly linked list, adding or removing a timer from a bucket is a trivial linked list operation. In the case of the initial insertion, you need to determine in which tier to initially place the timer (timer expires in less than 128ms,

8192ms, 524,288ms=85), and then it's a subtract and divide (by a power of 64, in this example) to select the bucket into which the timer is inserted. The redistribution between tiers is even simpler, you just run through the linked list of timers that need to be redistributed (from the first bucket in the tier), and you just insert each of them in the appropriate bucket in the next finer tier.

So the total processing of a timer consists of the initial insertion, zero or more redistributions (based on the log64() of how far in the future the timer expires), and the final dispatch. All of which are simple linked list insertions or just a step through a linked list. The total storage required is several tiers of 128 buckets, the buckets basically being just the list heads. And as I mentioned, seven tiers gets you a couple of centuries of timer. Plus, of course, the timers themselves.

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, Dec 10, 2010 9:27 AM

You know your application and I haven't been reading much of the thread, waiting to see how it develops. I'm still kind of ignorant because I've skimmed. But have you read Comer's XINU book and taken into account his chapter on delta queues? It almost sounds like you are struggling in that direction to me, but only you'd know if that's so. Mostly, I'm curious if you'd taken the time to read his book and are familiar with the idea enough to discuss it in your context.

Jon

- M
- Marc Jet
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, Dec 10, 2010 11:02 AM

I notice there is some overlap between what we came up with :-)

I don't understand the question. Maybe the comments below resolve it, otherwise please elaborate. What can be said is that both the ISR and Attach/Detach work at the same abstraction level (they reprogram the timer hardware). On the task switching level, the ISR executes asynchronously and interacts with the scheduler, while Attach/Detach always execute in the context of the calling task. Due to a potential race, Detach sometimes also interacts with the scheduler.

Not necessarily. In the async API described in my previous post, the "timer signal" describes a moment in time and triggers when that moment is reached. The trigger mechanism is implemented using my abstract concept of signal producers (which in this case represents the timer ISR side), and the trigger event manifests as signal on the sensor (which always represents a task).

A task can block on one or multiple sensors. The obvious use case for blocking on a single sensor is the sleep() function. A use case for multiple sensors, is to wait for IO to finish plus a timeout.

At first sight it doesn't make sense to simultanously block on multiple timer signal sensors, because there is always one to time out first. But even that is a possible use case! Specifically when a function implements own timeouts and ALSO honors timeouts from the upper layers (e.g. given as function argument). If it wasn't allowed to block on multiple timeouts simultanously, it would have to deep- inspect them all and find the first one to expire.

Note that my signal sensors and producers are generic entities, while the timer signal is a specific implementation using them. Although it's the most prominent one that I made and use, the producer / sensor concept is generic IPC and removes the task scheduling aspects from drivers and ISRs.

BTW your example code snippet, ported to my polled API, would look almost identical. Except there is no handle + struct/object, but rather the handle IS the struct/object.

This is a very important thing to understand. There can be as many independent time references as the application desires. They can be duplicated, passed to other modules, and dropped without caring about storage ownership.

From the multitasking point of view, with the polled API there's no effect beyond the current task. With the async API on the other hand, there's a common (sorted) list for all timer signals that are "attached" to the global producer (which basically is the timer ISR). "Attaching" DOES affect beyond the scope of the current application (cpu-wise, not memory wise). More timer signals on the list translates to slower attaching for everybody, and also more ISR activity.

Merging my polled API into the thread would not be beneficial, because it would prohibit using several timeouts at once. They are useful for nested timeouts, e.g. when a subfunction decides to implement a timeout without exposing it on the API. Also, in more complex statemachines I often find myself using several time references simultanously (e.g. last character received, last some_event, last lcd update).

If you referred to merging the async timer signals into the thread, it may have less negative impact - but still has. It would reduce them to really just be "an absolute timeout" for "the current activity". It would mean that there (suddenly) must be a strict boundary between OS functionality (which unconditionally aborts all action upon timeout) and application functionality (which initiates retry mechanisms and sane error handling, instead of just passing the timeout error upstream). Also, to a lesser degree the previous paragraph also applies here.

Best regards

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, Dec 10, 2010 6:48 PM

If you do this sort of thing long enough, you end up touching on damn near every possible implementation strategy (unless, of course, you take a pedestrian approach to it and "settle" for "the first thing that works" :> )

I *think* that's close to what I am suggesting (asking) above (barring terminology).

I've taken several approaches to the *basic* structure of timer/scheduler over the years (here, I am ignoring implementation of the actual timers themselves but, rather, how the "timing service" relates to the "scheduling").

By far, the most common approach has been to create a periodic interrupt ("tick") that the hardware uses to signal a *timing* service. This directly or indirectly updates the "timers" (how is not important). A "special" timer is the "scheduling timer" that defines the time left on the executing thread's timeslice. When that timer expires, the scheduler is invoked to "preempt" the current thread [there are lots of caveats, here, that can be ignored... important thing is "scheduler sits atop timing service"]. Again, the mechanism by which the scheduler is invoked is not important (and can be varied).

[Note that the "tick" can thus be semantically different from the "jiffy". Further, that threads can have different size timeslices -- e.g., this allows finer partitioning of the CPU *within* a priority level]

Also common with this scheme, time doesn't exist outside of the scope of the "tick". A practical consequence of this is that a thread that yields/blocks just prior to its timeslice ending (to be pedantic: "just prior to the next tick") can

*cheat* the next thread to execute out of a portion of its timeslice. This is because the scheduler runs (because of the yield) AS IF it had run at the preceding tick. So, the next thread is activated... and a tick comes along (coincidentally) immediately thereafter! :>

You can work around this by crediting the thread with a partial tick (in effect, increasing its timeslice by [0,1) tick) -- but that just adds a bias in the other direction (i.e., instead of getting cheated out of a tick, it *gets* some portion of a free tick).

To "fix" this problem, you can have the scheduler restart the "tick interval" as it activates the next thread. In effect, resynchronizing the tick with the jiffy at each thread activation. In that case, a thread that does NOT block or yield is guaranteed its entire timeslice.

But, now the tick is running at a varying frequency. (and, the scheduler is having to muck with hardware!) This complicates use of the tick (which drives the jiffy) for reliable timing.

You can "fix" this problem by running the timing services independant of the scheduler (e.g., another "clock"). Or, you can "do the math" (keep track of residuals) to track the "jitter" in this "clock" so that scheduling/timing are unaffected (i.e., compensated).

Or, just live with the problem. (i.e., pretend it doesn't happen often and/or that it happens "equally" to all threads)

What I was suggesting was letting the timing service run the IRQ exclusively. It peeks at an ordered queue of "timers" and programs the hardware timer (clock) with the period of the next timer (at head of queue). I.e., this is the soonest "thing" (temporal event) that is going to happen so any "clock" IRQ's prior to that would be "uneventful".

Now, have the scheduler feed events (which should be at most one?) into that same queue. So, coincident with activating the next thread to execute, the scheduler injects a timer event into that queue with an expiration time of "now + timeslice" -- and *remembers* a handle for that event!

If the scheduler gets invoked prior to that time, it has the responsibility of deleting that (no longer pertinent) event and reinserting a *new* one for the thread that it is activating

*now* (which may have a different timeslice!).

The problem with this approach is there is nothing that limits the granularity of events in that queue. E.g., if you allow time to be specified in "timer clock cycles", then you could conceivably have the two events in the queue differing by only a few clock cycles -- too close temporally to be handled as distinct events. (so, you want to be restrict the values that can be used for "times" -- which brings you back to the first implementation -- effectively)

Correct. I allow any number of "general purpose" timers. Note that I also allow timers to be inspected. I.e., a consumer can "time" how long something takes (and that thing can, in turn, time how long parts of *it* take, etc.).

This is essential if you want your code to adapt to the temporal constraints of its environment (instead of just breaking -- missed deadline -- as load increases). It makes more sense to implement "timers" (as in "measure elapsed time") this way and have the consumer track the "initial value" than to have the timing service do it.

Understood.

Yes -- but then you can't drag the timer out of the user's address space. I.e., the user can always directly access the contents of the timer struct in your implementation. My API allows the OS to "hide"/protect the timer *from* the "user" (a bug in your code overwriting some key part of the struct could crash the timer service, for example).

Yes, though see above.

Yes, the current task (thread) essentially acts as its own timing service. OTOH, there are no guarantees that it will promptly "see" an expired timer (e.g., the entire thread may be blocked when the timer expires; or, the thread may not be executing that code when the timer actually expires).

The first MTOS that I designed used a similar timing service except the timers were updated by a "timer thread" (i.e., timers were no different from any other "task"/service). The timer thread, in turn, polled a global variable updated by the tick ISR (i.e., nothing happened in the tick ISR other than incrementing a counter). Timers were arranged in an array (that the timer thread managed). A consumer could directly reference the value *in* a timer (e.g., by accessing "timer[i]").

This made for an incredibly powerful programming model. Since checking the timer was very efficient, you wrote code like:

if (DATA_AVAILABLE != check_uart()) { /* nothing here, yet */ if (timer[UART_TIMEOUT] > 0) { /* timer hasn't yet expired */ blink_light();

/* keep waiting */ yield(); } else { /* timeout waiting for data */ } } else { /* received data; process it */ }

I.e., it was *faster* to just peek at the timer's current value and "yield" than it was to implement an API to "wait_for_timer()". It also let you *do* things while waiting which, otherwise, required a separate thread *or* a heavier implementation.

(I still use this approach on *tiny* designs)

Exactly. The "current activity" is a service invocation. Since service invocations are synchronous (my choice), there can be only one (for each thread). Note that this does not preclude having general purpose timers -- as many as you want -- in addition to this "service timer". I'm just exploiting the fact that the "service timer" has a fixed and inflexible role. It need not have the same capabilities of a general purpose timer (e.g., it *belongs* to a particular thread; something that isn't necessarily a requirement to be imposed on general purpose timers).

I think this will let me make it lightweight enough that it ends up a net savings over using a general purpose timer for the same purpose (both in terms of space *and* time).

No. OS doesn't have to "abort" anything. I.e., if I were to implement deadline handlers to do this same sort of thing, then they *would* (typically) abort pending/blocked operations

9service calls). [this makes coding the application simpler but adds a lot of machinery in the OS -- as well as forcing the application to handle the asynchronous nature of aborted services].

If, OTOH, I adopt the *practice* that every service supports the concept of a timeout and is *coded* with this in mind, then they can assume (at some cost) the responsibility of watching -- and compensating -- for their own missed deadlines.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, Dec 10, 2010 7:01 PM

Sorry, that's what I meant: "X seconds from now" (now being the entry just before the head of the list).

So, some "buckets" can be "empty".

Are you discarding "resolution" (bad choice of word) as the interval increases? E.g., do timers specified for 8190 and 8191 ms (from now) *both* expire at the same ACTUAL instant?

Too much chance of my math and yours not agreeing so I won't follow the numbers. Rather, the answer to my above question should clarify weather *your* math is tracking "all

10,000 of those milliseconds" *or* is just rounding it off to the nearest X.

I.e., is the time stored in the struct as well as impliclty in the list in which it resides (regardless of which tier)? If so, then each "redistribution" requires you to examine every timer in the first bucket of "the second tier" and disperse them to potentially different buckets in the *first* tier -- based on their actual values (in effect, "time remaining").

OTOH, if you are using coarser "resolutions" for each successive tier, then 10000 ms is the same as 10001 ms (i.e., you are throwing away some portion of your specified time)

I can't comment on that without clarification of the above. :-/

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, Dec 10, 2010 11:23 PM

t,

a

g

he

ry

er

ms)

tion"]

o

Sure. And an empty bucket is trivial to deal with at the tick.

=BF=BDd

No. They'll probably move from the second to first tier in the same batch, but end up in adjacent tier 1 buckets, and then fire at successive ticks.

That's correct. The timer stores its actual expiration time. Neighboring timer's "declump" as the move towards the finer resolution tiers.

No, the other way...

n

power

,

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sun, Dec 12, 2010 9:29 PM

Hi Robert,

OK, the point I was trying to identify was that you *do* track the "actual" interval specified (and don't just "round it off" to coarser granularity as it increases)

So, in all but the first tier (assuming the first tier *always* has the granularity of the tick), the "redistribution" requires examining each timer as you "promote" (demote?) it to the finer granularity of the tier above (below? I guess it depends on which way you look at things :> ) to REDISTRIBUTE the individual timers into the buckets on that new tier.

However, you should only have to do this for (at most) the first "bucket" of each tier when that time comes (?)

It seems like this approach is geared towards lots of timers of wide dynamic range. The overhead seems geared towards letting you ignore most/many of them until close to their expiration time (though you have an ongoing, sporadic effort to *maintain* them as they move to finer grained "buckets" (i.e., you have to revisit them several times before they expire)

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Dec 13, 2010 7:14 AM

d

f

n,

,

s

Pretty much. Although I would consider the timers to be basically wholly ignored between initial insertion and dispatch, except for any required redistributions.

It has the advantage of scalability, as the amount of work done per timer does not increase as the number of timers in the system increases (which is not the case for most schemes), and not placing any limits on the range of the timers (other than being tick-grained).

The number of redistributions (each of which is very low cost), is always small (no more than six in the configuration we're discussing), and occurs over the lifetime of the timer. The total work expended for a timer over its lifetime is of a fairly small range - on the order of 3:1 for a very long duration timer initially deposited into the highest tier vs. a short one initially placed in the lowest. The most expensive operation being the initial tier selection (for which a good Count Leading Zeros instruction is very helpful). Certainly the simple redistribution algorithm is bursty, but smoothing that out is not hard (you have 64 ticks to complete the redistribution of a bucket from the second tier to the first, and you can just do 1/64th of the work each tick). The initial insertion and final dispatch are also lower cost than most schemes (for example, the traditional priority queue), for even a modest number of timers.

Obviously not counting the actual work of processing each timer when it fires.

As applied to your original post, you could attach timers to things pretty much at whim, and other than storage costs, have them add very little overhead to the system, without the complexity of having to share timers.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Dec 15, 2010 10:04 AM

Yes -- though you would also have to deal with new timer activity while doing that processing.

I think this is worth "filing away" for use in a "richer" environment. But, I think (for this project), I can hack a "service timer" into each thread and fold the support for it into the scheduler as if it was a core system feature (which it will be) -- just like any other aspect of the system. I think the distinguishing characteristic will be the range of "intervals" and their temporal distribution (e.g., something that effectively sits in a loop "steady state"

*probably* won't benefit from the extra structure ?)