Priority inversion and counting semaphores

Hi,

Sorry if this is a dumb question... but, is there a "textbook" solution for handling priority inversion with *counting* (i.e., non-binary) semaphores?

I've come up with a few approaches but none are very "economical" :-/

Thx,

--don

Reply to
D Yuniskis
Loading thread data ...

I haven't look at implementations in a long time, but wouldn't priority inheritance work?

When a process does a P operation on a semaphore in use the priority of the process owning the semaphore is changed to the process of the requesting process. On the V operation, the process is returned to the normal state and highest priority waiting process acquires the semaphore.

This assumes you can modify the kernel.

--
Thad
Reply to
Thad Smith

I don't think so. At least not the way it is typically implemented for *binary* semaphores (mutexes). See below.

With a *counting* semaphore, this gets considerably more complicated.

E.g., there isn't a single "owner" ("holder?") of the semaphore. If the semaphore supports a maximum value of "3", then there can be 0, 1, 2, or 3 "current holders" of that semaphore (parts thereof). So, now the kernel has to potentially track as many owners as the semaphore can support -- since some *new* higher priority task may wait on that semaphore (which means all of the current holders must have their priority RE-elevated to this newest priority level so *that* "waiter" isn't victimized).

Of course, any of these artificially elevated "holders" now can go on to wait on *other* semaphores (binary or otherwise). This forces their artificially elevated priority to be *propagated* to the current holder(s) of those semaphores, etc.

I.e., you (potentially) end up with lots of tasks running at the same priority -- that of the highest priority task still unable to acquire their desired semaphore.

There are many other, equally insidious problems that creep in:

What if a task is waiting for "2" of that semaphore -- but only "1" is available? Do you grant the "1" and have it pend for "1" more? Or, do you have it wait for both to be available? Can a task currently holding "any" of that semaphore come and subsequently request *more* of it? Do you limit requests to "one per customer" (forcing the task wanting more than one to adopt artificial means of working around that limitation)?

Binary semaphores are easy to deal with -- even the priority propagation aspects (beyond inheritance -- considerably more house keeping). E.g., if I hold *the* semaphore and re-request it (this is commonly the case when you bury locks inside objects), the kernel can recognize this and re-grant you ownership -- the equivalent of saying "check your wallet, stupid... you've already GOT it!" (the kernel can even enforce paired re-request and re-releases... think about it).

But, if you allow an owner to "request more", how does the kernel know if your request for "1" is a repeat request for the *same* "1" or an *additional* request -- for *another* "1"? I.e., buried locks become very problematic to handle: if you assume it is a request for the *same* "1", then the task must make a point of asking for everything it needs in its initial request -- since the kernel will, thereafter, see new additional requests for that semaphore as "re-requests" for what the task currently holds (hmmm... what if holding "2" and requesting "1"? or, vice versa??); OTOH, if you assume it is a request for a *new* one, then the

*task* has to be aware of any buried activities concerning that semaphore so that it doesn't ERRONEOUSLY repeat them.

I suspect there isn't a "general" solution. So, I'm looking for a CONSTRAINED solution that I can potentially adopt.

Never a problem, there! ;-)

Reply to
D Yuniskis

OK. I hadn't thought about multiple successful grants distributed to different processes.

Given that, my initial thought was to elevate the priority of the process with the oldest grant. Without other information that process would be likely to release first. What if it hit a semaphore block? You have a choice of propagating the elevated priority to attempt to drive the oldest process to release or switch to another granted process. The better strategy may depend on application specifics.

If the process needs two, it makes sense to me to wait for two. Asking for one, then one more runs the risk of an intervening request for one from elsewhere. It would also make it easier for a higher priority task to request and get one while the lower priority is waiting for two (a good thing).

That seems like a policy decision for the overall application, possibly enforced by the task manager.

That's true, but I would prefer to flag those events as design errors, probably caught with an assert or equivalent. It should be possible to come up with assertions for errors in the multiple allocation allowed case, but probably harder.

I agree. Perhaps you can request "my nth allocation", proving that the task and task manager are in sync. This basically argues that each task potentially requesting multiple allocations has its own manager for requesting semaphore allocations, rather than going directly to the task manager.

--
Thad
Reply to
Thad Smith
[attributions elided]

Or, multiple requests from "current holders" :<

Rewarding the oldest just because "he got there first" doesn't seem like there's much promise there. :<

As you said, I think this can't be solved in the general case but will require some "bending" to fit the application. Or, worse than that... the *implementation* :(

But consider that the requesting task may not -- yet -- know that he *wants* "2". Or, may not be aware that something that he is going to do has buried requests embedded within (e.g., consider a library that makes calls on those semaphores -- either it must publish those needs and "propagate" them out to it's API so that the caller can factor them into consideration *or* the implementation must support them inherently)

See above.

Actually, I have found it makes coding and design *much* easier if you support this ability. I.e., add a counter to the mutex such that it counts how *many* times it has been "redundantly-acquired" (an assert on that counter can protect against rollover). This allows you to affordably implement the "feature" while protecting against abuse. Otherwise, not only does a task have to remember not to reacquire the mutex but it also has to remember to *defer* releasing it until the "outermost" bit of code would release it.

This is sort of like writing a bit of code to disable and then re-enable interrupts to protect an atomic section of code -- like modifying a counter, etc. If you write: function() { ... disable_interrupts(); do_something(); enable_interrupts(); ... } then you can never invoke function() from within a region where interrupts are disabled -- since it will have the side-effect of enabling them prior to returning.

OTOH, if you implement it (effectively) as: function() { interrupt_state_t previously; ... previously = set_interrupt_state(LOCK_LEVEL); do_something(); set_interrupt_state(previously); ... }

This makes things like transfer() much cleaner to implement in terms of the actions (functions) that it is intuitively performing: transfer(amount, savings, checking) { interrupt_state_t previously; ... previously = set_interrupt_state(LOCK_LEVEL); withdraw(amount, savings); deposit(amount, checking); set_interrupt_state(previously); ... } where withdraw and deposit each look something like: withdraw(amount, account) interrupt_state_t previously; ... previously = set_interrupt_state(LOCK_LEVEL); adjust_balance(-amount, account); set_interrupt_state(previously); ... if (check_for_overdraft()) { incur_fees(); transfer(overdraft, backup_account, account) } ... }

deposit(account, amount) interrupt_state_t previously; ... previously = set_interrupt_state(LOCK_LEVEL); adjust_balance(+amount, account); set_interrupt_state(previously); ... }

instead of forcing the atomic actions to be "exposed" and promoted to the transfer() level in the function hierarchy: transfer(amount, from, to) interrupt_state_t previously; ... previously = set_interrupt_state(LOCK_LEVEL); adjust_balance(-amount, from); adjust_balance(+amount, to); set_interrupt_state(previously); ... if (check_for_overdraft()) { incur_fees(); transfer(overdraft, backup_account, from) } ... } (not trying to be thorough -- just as an example -- replace interrupts with mutexes and you can see how the first implementation is much more intuitive/cleaner

I figure the pathological case is to treat each as a single, discrete mutex. So, if a semaphore can have a count of "3", then create 3 mutexes and layer something atop them that allows "n" to be processed at a time. This isn't really equivalent but it is a good starting point to *think* about what the behavior *should* be -- hopefully that will give me some insight as to how to more economically constrain an implementation...

Reply to
D Yuniskis

I also tend to think there is no best solution in the general case. I do however see some reasonable solutions, they have their caveats though.

Promote all processes to the current priority level - The caveat on this one is on release , do we reset all tasks to their previous priorities? and let any waiting semaphores handle the task promotion on their next iteration , or do we go through them and do the necessary task promotion ourselves.

Promote the oldest process - We assume it's likely to finish first.

Promote the highest priority process - We assume it's likely to finish quicker providing we've made our high priority tasks short.

Promote a random process - We assume any choice is as good as any.

When promoting a single process and we have stalled (been waiting too long), We also have the option of promoting other tasks.

While it's not equivalent , I think it's the closest you'll get. We definitely need the layer we add to be atomic. We lose a bit of concurrency on acquiring and releasing the semaphore. We just hope gain it back with the benefits of our additions.

Cheers, Nicholas King

Reply to
Nicholas King

Op Fri, 08 Apr 2011 04:37:03 +0200 schreef D Yuniskis :

If the solution isn't obvious or at least trivial, then you are using the wrong type of synchronization primitive.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
 Click to see the full signature
Reply to
Boudewijn Dijkstra

I haven't joined in on this thread because the one solution that I could think of isn't economical, either. But here it is:

Take a semaphore that's already up to it's limit, and some high-priority task comes along and wants one. Then you take the highest-priority task that has that semaphore checked out, and you raise it's priority to the priority of the high-priority task. Then it runs, and (hopefully) releases the semaphore, and life is wonderful.

When I start thinking about what the OS would have to do under the hood I think maybe this isn't economical -- but there you are.

I'm curious what resource you're putting behind this counting semaphore, though.

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

The problem with all these approaches (at least the ones I have considered, so far) is that they are hard to "think through" to come up with design criteria that ensures you don't shoot yourself in the foot *applying* them.

I.e., the simple priority inheritance (as well as priority ceiling) protocols are more easily understood because they work on a *single* task-of-interest. So, you can think through how that task's subsequent actions and interactions are handled and ensure (?) that the system always "does The Right Thing".

When there are potential for multiple tasks to be holding a semaphore concurrently, you either come up with a "solution" that treats them all "simultaneously" *or* singles out *one* (using some criteria) in the hope that it makes the system behave well.

[now, think that through from the *designer's* standpoint... how does he know AT DESIGN TIME that a particular task will be "The One"... or, *when* it will be The One? etc.]

All of my services support timeouts and mutexes. So, any service can be locked and/or "waited on". This allows me to move deadline handling into userland instead of in the kernel (a different approach than previous RTOS's I've designed). It also lets a task *actively* change how it responds to encroaching deadlines instead of *reacting* ex post factum. Hopefully, this lets the system be more resilient to overload (at least when tasks can "see it coming"!)

Some resources aren't binary in nature. And, if I can come up with a "generalized" way of handling them, then I can also deal with the mutexes as "special cases" of the more general case.

This makes it cleaner to have the OS "knowingly" move resources to waiting tasks instead of having the task "spin-wait" reissuing that failed service request repeatedly. It also makes it easier for the designer to specify, a priori, which tasks are "more important" and how preference should be shown to them at run time -- without inventing a second "preference mechanism".

Reply to
D Yuniskis

No idea about a textbook solution, but the required behavior is pretty simple. All processes who hold the semaphore should have an effective priority as high as the highest waiter.

The obvious solution (recalculate all priorities when anything changes) can be optimised in a number of ways, but the most obvious one is do nothing until a waiter is added with a higher priority than any in the queue (keep a separate variable for that), and then push that priority through all holders of this semaphore. This is a transitive operation, as those waiters may be waiting on another semaphore. It is however acyclic, because a cycle will self-terminate at the same priority level.

Whenever a waiter at the "max priority" level gets access, you need to scan the waiters to see whether to lower the max priority.

In most situations, you could get away with never resetting elevated priorities when a high-priority waiter gains access; just reset when releasing the semaphore. A process may run for a while at a higher priority than it deserves, but that's unlikely to hurt.

Clifford Heath.

Reply to
Clifford Heath
[attributions elided]

I think a request for "n" must also be atomic. Treating "n" as a *series* of n discrete requests lets other tasks slip in and grab *some* of their own "m" requests --> deadlock.

The mutex-parallel solution is far from ideal -- it scales poorly. But, it might help to gain some insight in how a "reasonable" algorithm could perform.

Or, it might make it glaringly obvious that a unified approach like this is just not in the cards... :<

Reply to
D Yuniskis

Frankly, the only thing I use semaphores for is as an abstraction for an interrupt -- i.e., I have an ISR that responds to some event in hardware (and that possibly does some necessarily quick processing, like stuffing a word into a queue), and that ISR sets a semaphore.

Off in OS-land, that semaphore is pended upon by one and only one task, which fires off in response to the ISR goading it, does what it does, then pends on that semaphore again.

If more than one task needs to access some bit of hardware, then chances are the hardware design is faulty, or the software architect hasn't been crafty enough in abstracting the interface to the hardware.

Granted, I do much more with itty bitty boards that drive motors, rather than great huge systems that talk to disks and such. But even so, the notion of leaving a system open to having some task that actually matters getting stalled by some piss-ant light blinker or button reader just makes me shudder.

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

You're using the semaphore as a signal/event.

I structure interrupts similarly (i.e., do as damn LITTLE in the ISR as possible).

If your RTOS is *truly* preemptive (i.e., *any* action that can change the runability of a task causes an immediate reschedule()), then you can use this mechanism to invoke that "one and only one task" [actually, this is extensible but not germane to this discussion] AS SOON AS the ISR terminates! I.e., effectively running the "back half" of the ISR (your "one and only one task") in user-land (in the "background", so to speak).

If yet another "event" is signaled (in an ISR), then the task representing the "back half" of that ISR is similarly "made runnable" and, depending on priorities, can preempt the "other" back half task.

[you have to take care to ensure you don't drop an ISR, though, since this mechanism doesn't *queue* task activations but, rather, treats multiple the same as one]

Sometimes you can't easily split a hardware device (esp one that inherently has multiple "channels" -- most notoriously, devices with "modes"/"subaddressing" registers). E.g., imagine sharing a CLUT between two disjoint tasks...

If that's the case, then there is a flaw in your system design. I.e., a piss-ant light blinker shouldn't be competing for resources needed/used by "more important" tasks. (if it is, then clearly some part of that piss-ant should have been split off as a separate entity)

Reply to
D Yuniskis

Yeah, but that quickly deteriorates into "everything running at the maximum priority" :-/ I suspect this ends in deadlock often.

It seems like this should be an easy extension of the binary semaphore case. Yet, I can't seem to wrap my head around the criterion that makes it different! (i.e., once you identify that, you can consider constraints that do away with this difference)

PI mucks up determinism. I think this makes it a laughable goal. There are just too many "what ifs" in the unconstrained case.

I'd like to come up with a simple rule to make designing in this context more reliable. E.g., like always taking mutexes in a fixed order to avoid deadlock (easy to remember, easy to implement, etc.)

Reply to
D Yuniskis

Sometimes you can solve your priority inversion problems by doing a little bit more in the ISR. In other words, do all the high-priority stuff in the ISR, and defer the rest to a lower priority task.

There's no general reason to do as little as possible in an ISR. I've made numerous projects, where half the CPU time is spent in some ISR. As long as no other ISR gets into trouble, it works fine, and it can greatly simplify the overall design.

Reply to
Arlet Ottens

Since most embedded systems are event-driven, it is the natural thing to think in terms of events. Also the human mind is way more compatible to event-thinking than to semaphore/mutex-thinking.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
 Click to see the full signature
Reply to
Boudewijn Dijkstra

That assumes one of your tasks *is* the ISR. :> If you simply have two *tasks*, you don't have that luxury (unless you can arbitrarily assign an artificially high priority to one task and ensure it starts before the other and always can acquire everything it needs -- in effect, turning one task into an ISR without the associated hardware interrupt)

The problem with putting "more than essential" in the ISR is that it leads to more brittle implementations. Once you run out of REAL real-time ("foreground"), you're screwed.

Usually ("In general, avoid generalities"), paring the ISR down *does* have a higher cost/complexity than something written entirely in an ISR. This is because you have less that you can rely upon when eventually "doing the work" (that the ISR would have otherwise done) and have to synchronize your actions with those of the (often "still active") ISR.

OTOH, decoupling the work from the ISR can leave your system more resilient/robust -- often because you *had* to consider more "what-ifs" in this approach ("what if the ISR runs, AGAIN, before I have finished processing this stuff?").

I had designed a system with a barcode reader (just "raw video"... like from a wand). Essentially, the "video" was wired to an interrupt pin on the processor. So, a "user" had the potential to tug on the interrupt pin at any time.

SInce this is an obvious vulnerability, it saw a fair bit of attention when people tried to "break" (challenge) the software!

It was amusing to watch people rub the "wand" back-and-forth over barcode labels at incredibly impractical rates (I had designed for 100ips). It was as if the software was being presented with an improbably long label scanned at a very high rate.

While scanning, you could see everything stall in the device (as all real-time was consumed catching transitions). But, as soon as they stopped (eventually, your arm gets tired when you do these fast, short, repetitive motions for more than 10-15 seconds), the system resumed *and* displayed the correct barcode symbols. Had I opted to do anything more than "capture the edges" in the ISR, the system wouldn't have reacted as favorably (it takes longer to decode a barcode than the time the label spends in front of the scanner)

Reply to
D Yuniskis

Sure, but usually, tasks are driven by interrupts, so there already is an ISR.

Sure, there can be problems, but running out of (total) time isn't one of them. If there's not enough time for ISRs, there's also not enough time to do the same work in tasks. Maybe you can queue the work for later, but at some point in time, you're going to run out of buffer space, or the data will be too stale to process anyway.

Yes, but running the same work in tasks has very similar problems. You still need to worry about what happens if more stuff comes in while the old stuff isn't done processing yet. In fact, with all the extra overhead, you'll have less time overall for useful work.

Sure, in those cases, you can't do all the work in an ISR. Every problem is different. Sometimes it makes a lot of sense to minimize the ISR, sometimes it's better to have a fat ISR, and do all the work there, and sometimes you have to pick something in between.

I try to be flexible, and think about what's the best solution for a given problem, rather than working from standard rules.

Reply to
Arlet Ottens

Sure! They aren't mutually exclusive (ha!) usages.

It's actually hard to explain to "lay people" how things like mutexes work and the problems they try to solve. If you try to draw real-world analogies ("There is one paint brush. I am currently using it. You *want* it. How do we figure out how I can do what I need to do and you can do what *you* need to do, given this constraint?") then the explanation looks so painfully obvious that folks say, "So, what is the problem?"

Reply to
D Yuniskis

Hi Arlet,

[replied previously but noticed I misspelled your name and tried to "cancel" the post. Apparently succeeded > >> OTOH, decoupling the work from the ISR can leave your

That was exactly my point: decoupling them *requires* you to address these issues up front. Stuffing things in an ISR leaves you with just one question: what if another ISR comes in before I finish servicing this one? (and you *miss* it)

E.g., when I move data from ISR to background task (FIFO's almost always), I consider how I "mark" the point in the data stream where data was lost (because the buffer was about to be overrun). And, whether I "lose" the oldest buffered datum *or* the newest incoming datum (there is a subtle difference!). The task then knows to look for these "markers" and knows exactly what they represent. I can then enlarge buffers to accommodate varying degrees of "real time overload" without having to figure out how to save

3 more clock cycles from an ISR.

Rules are evil. :> Folks who believe in rules think you can solve every problem with a good set of rules and a policeman.

You should always think of rules as Guidelines. They should alert you to potential pitfalls. They should give you pause to reconsider why *your* design does -- or does not -- apply.

Experience has taught me which "rules" are easiest to break and the relative costs of breaking them. Being *forced* into adhering to arbitrary rules may be good "for Programmer's, on the whole" but in the *specific*, it can be a real nuisance.

E.g., I love pointers. When I have to code in an environment with an obsessive policeman prohibiting their use (i.e., because the language doesn't *support* them!), the quality of my solutions suffer -- as well as the enjoyment I get from completing the task ("Sheesh! Remind me never to do *that*, again!")

Reply to
D Yuniskis

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.