Applications "buying" resources

D Yuniskis · 2010-12-30T08:43:56+00:00

Hi,My RTOS has hooks that let the "kernel" ask tasks(forget how that is defined) to voluntarily shedresources (time, memory, power). This lets thekernel (actually, a "special" user-land task thatmakes decisions *for* the kernel) throttle backthe demands placed on the *fixed* resources tobetter adjust to the current demands placed on them.So, if some "new" task needs to be spawned andneeds some resources, the kernel can ask thecurrently running set of tasks to relinquishresources that they may be holding but not *needing*.But, my current implementation has very coarsegranularity. In essence, the kernel uses increasinglyharsh tactics to get what it needs:- "please give up resources"- "you WILL give up resources"- "if you won't cooperate, you'll have to go away"- "die, sucker!"Of course, if all applications were well behaved, theywould comply with early requests and, hopefully, staveoff the more severe requests that could follow.But, applications tend to "gamble" -- "maybe someone elsewill give up enough resources so I won't have to!" Thiseffectively reduces resource management to:- "please give up resources"- "die, sucker!"I can tweak the resources *available* to a particular task.But, if I use this mechanism to compensate for "uncooperative"tasks, then it unnecessarily restricts their capabilities...which makes them *more* inclined to "not cooperate" (asthey *horde* the limited resources that they have!)What I would like is a scheme whereby applications can decidehow much the resources they are holding are "worth". And,then "buy" those resources (with some sort of constrained"purse"). This gives the "kernel" a bit more flexibilityin handling resource reallocation -- e.g., *it* sets thecurrent price for the resources and can change that pricedynamically to "entice" tasks to shed resources (by makingit increasingly expensive to hold onto them)[there are some fairness issues I still need to think through...e.g., if task A gave up a resource at a pricepoint of K...

D

D Yuniskis 15 years ago

[attributions elided]

Yes, but they have *code* that is intended to handle the event. I.e., the developer doesn't *ignore* it. It is a key part of the API.

---> If you want to draw on the screen, you deal with the expose event.

---> If you want to request "spare" resources, you deal with the "relinquish resource" event.

That;s a problem with the OS not handling overload gracefully (didn't someone complain that an *RTOS* shouldn't be dealing with the sorts of issues I'm raising? :> )

For a *closed* product, you can "examine all possibilities" and "pick wisely" (at design time). If you open that product up to third party applications/enhancements, then you don't have any control over the quality of the "designers" for those aspects of the resulting system. Hence my desire to come up with a framework that *(en)forces* compliance. And, a way that allows users to intuitively identify the (third party!) aspects of the product that "suck".

Yes, years ago I looked into L3.

Hmmm... I think that would be a tougher environment to work in (unless you have backing store). I.e., any reference to a "swappable" page has to be able to deal with a page fault. While the OS can field the fault, the application would then need to decide what to do about it (e.g., if the default pager doesn't swap the page back in). So, if you want to be able to *discard* pages (like me), you would have to surround all references to those pages with code to check for pageouts.

E.g., if you have built 10 pages of (sequential) data and are working on page #11, page #3 might get swapped out while you aren't looking at it (and hence won't fault) just as easily as page 11 could.

(I suppose you set the handler for those pages to PREVENT pageout -- though that sort of policy would too easily be abused)

I think, conceptually, it is easier to be *told*/asked to relinquish resources and then let *you* decide which to release so you subsequently *know* not to reference them.

Vote

D

David Brown 15 years ago

That's just bad specification.

A real-time system *must* work. But what you mean by "work" should be correctly specified.

For example, for a motor control system, the most important functionality may be to fail in a free-wheeling mode, so that the system can be easily overridden manually. Actually running the motor would then be a lower-priority requirement.

Or your video player may have a requirement that it should either show the frame, or drop it, with perhaps a required minimum percentage of shown frames. The ability to drop frames is a feature and part of the specification - it is not a failure.

"Failure is always an option" may be a good motto for Mythbusters, but it is a poor way to view real-time systems.

Vote

D

David Brown 15 years ago

It all boils down to how you specify a correctly working system. A system that does not work as specified is broken, and a system that you /know/ will not always work as specified, is badly designed. If you want to design a system that will do a particular job most of the time, then that's how you specify it - "in /these/ circumstances, the system will do this job - in other circumstances, it will do something else".

You /cannot/ just make software that will work some of the time, and call it a working system! If you want to save resources and make a system that can do A, B and C, but no more than two at a time, then that's your specification. You have very definitely /not/ made a system that can do A, B and C.

I don't mean to say that a system should not have parts (or even its whole) working in a "best effort" way. It's unusual to have a system that doesn't have such parts. But they are not real time, and certainly not hard real time, and the rest of the system must be specified and designed with the understanding that those parts may fail.

Vote

S

Stefan Reuther 15 years ago

According to my definition of real-time, it doesn't "hope", but it "knows" or "guarantees". If it just hopes, that's typical best-effort scheduling in a desktop OS. You set your priority high enough and hope to be able to decode all video frames in time, and if you don't manage to do it, you try to compensate. That's at most soft real-time.

I didn't talk about reaction time or availability. But it has admission control: years ago, you had, say, 100 copper wires in a cable from London to Paris. Which means: 100 people can talk. The 101st gets a busy signal. Now you have one fiber which has bandwith for 100 people. Which means: 100 people can talk, the 101st gets a busy signal. It doesn't try to squeeze a 101st call into that fiber, by asking the others to step aside a bit and downgrade. At least, I hope so (I only buy telephone from companies which offer me unfiltered end-to-end G.711, no "NGN" crap).

"batch" tasks are simply tasks with increadibly low priorities, which are executed in round-robin.

Those do not need timely feedback from the kernel. The kernel just says, "hey, this happened, deal with it."

This one needs feedback.

But how does the kernel know that the task is done? How can the task be sure that the kernel knows it cooperates? Let's say I'm making a task that wants to play nice. It has a SIGFREEMEM handler. It goes through my data structures and frees a lot of memory (which is BTW not trivial if it can, like a signal, happen at any time, even within malloc). How can I be sure that I'm being given the time needed to do the cleanup, and not being struck by a kernel timeout just because other tasks ate up the CPU?

My problem is the definition of "while this happens". If it's a timeout provided by task B, it would depend on B whether A works (according to Murphy's law, B of course sets the timeout just one ns too low for A to succeed). If it's a return code "sorry, I have no memory, but I have asked others; please retry later", that'd be polling which is evil[tm]. If it pends B until A calls back, that's priority inversion.

Stefan

Vote

S

Stefan Reuther 15 years ago

The task would have to tell that the pager. The usual contract is: if you take away my page, you have to give it back to me with the same contents when I need it again.

Of course, the application could also tell the pager beforehand: if you take this one instead, you do not need to preserve its content. Or: if you take this one, you do not need to preserve its content, because you can load it from that file ---> over there. Or: if you take this one, please set this flag.

That's the point I'm trying to make: it puts the knowledge of which resources are disposable into the same entity which manages the resources. It does not need application code to execute at the time when resources are scarce.

To fit into your "buying" model: of course the resource provider can make discardable pages "cheaper" than non-discardable. Like: you can have ten megs with backing store, and twenty without.

Stefan

Vote

J

Jon Kirwan 15 years ago

I just find my reasoning for using an operating system in my application spaces more closely aligned with DY's arguments. I'm not arguing that there aren't other application spaces that are quite different, or that you are wrong-minded in your own perspective. I'm just enjoying the discussion.

Jon

Vote

D

David Brown 15 years ago

It is possible to argue that this is all shades of grey, and it's a matter of emphasis. After all, it is true that a real-time system must do it's best in the face of problems, such as hardware failures - and you /could/ say that "lack of resources" is just another such failure. But this thread has been about "real-time" systems from the start - you simply don't design a real-time system with such lack-of-resource failures. Such a system is all about guarantees - /if/ there are no hardware failures, then the system /will/ work. You can use operating systems for other purposes too, such as to mediate between different best-effort tasks - but that is not real-time.

Vote

J

Jon Kirwan 15 years ago

I thought it was about DY's RTOS.

I think you are just arguing, now, when I wasn't wanting to. So, okay:

I use an operating system for software partitioning, simplicity of design, and flexibility for future change. I generally do applications where EVERYTHING COUNTS, though.

DY's comments about those making a customer pay more than they otherwise have to (in terms of excess hardware costs, size, battery/power usage, heat, lower precision or repeatability, etc) is about where I'm at. All corners in my application space count. I don't want to make a customer pay for 256k flash parts when 8k would have done had I avoided linking in STL, nor do I want to "accept" for the customer a

5mA average draw in a battery application when 200uA is sufficient for the same job.

Of course, the application has to work. That's given. Since everything I do is instrumentation, real time is also a given. As is precision of measurement sampling. Where appropriate, I will go to great lengths to ensure that measurement variation is zero clock cycles. Meaning that I don't even allow any variation of interrupt latency and will select a micro that can guarantee that or else arrange the design to avoid it, if needed. The operating system and threads in it will be arranged to support that with absolute precision.

Operating systems that are targeted at wider audiences generally cannot address my needs very well. I may have selected a micro with 128 bytes of RAM... total. And need

100 of it for the application, leaving 28 for the O/S and the three threads needing support for sleep and semaphore queues. Just as an example that isn't too far from what might take place.

Jon

Vote

T

Thad Smith 15 years ago

Don, I suggest focusing on the real-world cost of shedding resources. If a tasks yields 100 kB of memory, what is the cost to the user -- an extra 300 ms response time, perhaps? The manager may say "I'm willing to accept 1000 ms additional delay, how much memory can you release for me?"

Thad

Vote

D

D Yuniskis 15 years ago

Understood. That's easy to implement: the kernel (or its agent) takes the page, marks it as "missing", then when/if the task tries to access it, faults a (possibly new/different) page in, fills it from backing store and resumes the "interrupted" task.

But, think of *that* scenario: the kernel (or its agent) takes the page, marks it as "missing", then when/if the task tries to access it, page fault allows the access to be completed (with bogus data) and the expectation that the task will "notice" that a flag indicates the data should be ignored (?)

Understood. There is potential for BIG WIN in that approach. I'm just trying to think through real world code and how it would handle this (brain is fuzzy as I have been up all night babysitting the citrus :< ).

Let me back up.

Some portion of a task's memory is wired down. I.e., it *knows* that it has this memory, guaranteed (all or nothing -- if the task is killed, it loses everything).

[I recognize your idea can be extended to cover this case as well]

Beyond that, some varying amount could be "surplus resources" that it has acquired at various times. It may add or subtract from this freely throughout its execution.

In addition to the above "notification" problem (I think I can solve that using the same exception handler mechanism and just have the kernel's page fault handler "complete the memory access" with bogus results -- this works OK for everything but *code* execution from a "forfeitable resource"), you also need a way of letting the kernel know which pages "take first".

Returning to the MP3 example, if pages 3, 9 and 11 contain the decoded output for seconds N, N+1 and N+2 of UPCOMING audio, then the MP3 task wants them to be "taken" (forfeit) in the order

11, 9, 3. Any other order effectively renders the remaining pages useless (or, "less useful").

One idea that comes to mind is to tag each page with the "price paid" (Robert's Dutch auction). No, that's probably not right (because what you paid for it might not reflect what it is currently worth... e.g., page 3 is worth more than page 11 *now*. Yet, when you bought page 3, you might have paid less for it because page *2* -- which has since been discarded -- was more valuable). But, the idea trying to take hold in my sluggish brain makes sense -- tell the kernel what the page is worth so it can make that decision on my behalf without my involvement.

Note that my idea of letting the kernel NOTIFY the task allows the task to make that value decision AT THAT INSTANT instead of having to keep appraising the kernel of changing values...

And, of course, all of this is predicated on using VM to move the resources around. (it wouldn't extend to a flat, nonpaged memory model).

I think I need some sleep. Wonderful. I get to dream about memory allocations... :-/

Vote

D

D Yuniskis 15 years ago

I had to read this a couple of times to make sure I understood your point. So, if I've *missed* it, I guess that means "a couple" wasn;t enough! :>

[by "manager" I assume you mean the kernel -- or it's agent -- in regards to "renegotiating" resource (re)distribution.]

The appeal of your approach is that it speaks directly to the reason *behind* "using surplus resources" -- most often, to improve response time or execution speed (similar aspects).

I see several problems with committing this to practice, though.

First, the degenerate case -- where the kernel is DEMANDING those resources. How does it frame the "proposition" to the task: "I'm willing to accept INFINITE additional delay, how much memory can you release for me?" :-/

You also have to deal with relating resources to time/latency. For example, the "thinking ahead" chess automaton can probably give you a numeric value: megabytes per millisecond (i.e., looking at how much memory it takes to "think ahead" 1 millisecond). But, this might not be a finely divisible quantum. The automaton might need 10MB chunks to do anything "worthwhile" (note that I have no idea what the resource requirements of such an algorithm would actually be. I am just throwing wild numbers out for illustrative purposes. If I used a "counting application" as an example, it would be hard to talk about megabytes with a straight face! :> )

Furthermore, it might be difficult for that automaton to decide *which* chunk of memory to discard (if, for example, it only is currently using enough to think one move ahead... what *fraction* of that move should it discard?). By contrast, the MP3 player example could easily decide what to discard -- the decoded audio furthest in the future (one can argue that even this isn't a *given* since you can't predict what the user might opt to do a millisecond from now).

The other problem is that it might penalize or reward applications unfairly. I.e., one application could end up frequently forfeiting its resources while others never do. For exxample, telling the MP3 player that it can be 1000ms late on it's deadline(s) would essentially cause it to dump ALL of its resources: "Heck, I don't even have to START decoding because I'll have plenty of time AFTER my deadline to do what needs to be done!" (and, does the

1000ms apply to all periodic deadlines thereafter?)

But, the biggest implementation problem I find is trying to map this into units that you could use to value specific resources. How do tasks decide what they want and whether or not they can "afford" it? How does the user configure "priorities"? I guess he could specify a "tardiness tolerance" for each application and let them figure out what that would cost (in terms of resources). But, what prevents an application from saying, "Ah, I have a tardiness tolerance of 10 minutes! I *need* 100MB of memory!!" (how does the kernel decide what a fair mapping of resources to "tardiness" would be?

However, the appeal of putting a number on how responsive (or "fast") an application is has appeal as something a user can easily relate to. I think it would be a good "user interface unit" if I can somehow map it into/onto quantifiable units.

Vote

M

mike 15 years ago

The details of all this are way above my pay grade. But I can give you a user input. If I just spent two hours entering my tax return into turbotax, I'm gonna be really PISSED of the program gets dumped because the developer failed to answer one of your resource relinquishment requests. Same thing if I'm logging data for a nuclear power plant. Or controlling a welding robot.

There's also a liability issue. If Program A crashes the system and does harm, they MIGHT bear some liability for the outcome. If YOU kill the program and harm ensues, YOU might bear some liability for the outcome.

"The defibrillator pulse was canceled due to excessive use of resources. IF you feel this was an error, have your heirs contact our product improvement department...Sorry for any inconvenience."

Just because you CAN do something is not always a good reason for DOING it.

Vote

D

D Yuniskis 15 years ago

You're missing out on the key distinction between RT systems and nonRT. With an RT system (which is anything that includes at least ONE RT aspect), there is value associated with timeliness. What you call "not working" can be an acceptable option for a RT task.

Think in terms of value vs. time wrt a particular "deadline". If a task's completion ("answer") has positive value at some particular time, then getting it "done" at that time contributes positive value to the user. If it has *no* value, then it's completion has NO value to the user (i.e., if the value is zero henceforward, you should abandon all work on it). If it has NEGATIVE value, then it represents a *cost* to the user. Only if it has INFINITE negative value does it indicate the system is BROKEN (i.e., there is no way to accumulate value to overcome this cost).

A soft-real-time system has positive (often decreasing) value at the deadline. Getting it done earlier is "worth more" than getting it done later. But, getting it done later is better than not getting it done at all. A hard-real-time system has zero value (see below) at the deadline and thereafter. I.e., get it done before the deadline or don't bother.

Some systems have funky time/value characteristics. E.g., an application that controls a robot arm to pick a bottle off a conveyor belt would have a "bump" shaped value curve (picking the bottle up too EARLY is just as useless as picking it up too LATE -- as the bottle is only in an accessible location in a certain time window). When you think in terms of time/value functions, it becomes a lot easier to decide the type of system you are actually working with.

Far too often, people force SRT problems into HRT frameworks. This is because it is EASIER to pretend a deadline is "all or nothing" -- and then scale UP the hardware to meet that "ultimatum". Effective SRT system designs are much harder to design than HRT "solutions". As such, the presence of any SRT aspect in a system can be regarded as making the entire system an SRT one (since SRT mechanisms are a superset of HRT mechanisms).

A casual examination of this makes it clear that HRT systems are inherently more brittle than SRT ones. More resources. And more brittle. (lose-lose?)

The consequences of a missed deadline are a separate issue. If the robot bottle picker misses a deadline, a bottle breaks. Big deal (of course, what's *in* the bottle might significantly affect this assessment). If these missed deadlines are infrequent, you might just opt to put a trash can under the robot arm to save yourself the effort of having to sweep up a few broken bottles each day. Note that the alternative -- shutting down the production line -- could be far more costly than a few broken bottles!

A tablet press compresses "powder" (granulation) in "molds" (punch & die sets) to form "pills" (tablets). These devices operate at rates of up to 1,000,000 tablets per hour. Roughly

5ms per tablet. Most tablets are produced to a specific weight (there are other criteria). You can't weigh individual tablets at this rate. So, you typically use another metric to determine if the weight is correct.

Since the size of the tablet is constrained (by the dimensions of the "mold"), monitoring the forces exerted to compress the granulation to that specific size roughly correlates with the amount of material *in* that mold.

Granulation is metered into the molds by gravity feed. Much like MEASURING CUPS (grin) -- the molds are overfilled and then the excess scraped off (of course, there is nothing to guarantee that there *is* excess on any particular tablet!).

The filled mold is then compressed, twice. And, ultimately, ejected. Some systems also monitor the force exerted to eject the tablet. As the tablet is ejected, it must be routed to the "good" bin or "bad" bin (some systems also have provisions to selectively "sample" tablets necessitating a third routing option) based on criteria from measurements made while the tablet was being formulated.

The whole process happening in ~5ms.

Real fast, yes. Real TIME? Yes -- but not because of the speed.

Now, HRT or SRT?

Well, we have no control over an individual tablet's formulation except by examining how it's predecessors were formed. We can't tell if the feeding system is functioning properly until we get a sample of the compression force exerted AFTER the die has been filled (Hmmm... is the feeder still feeding the same way? Could a clump of granulation have caused a temporary anomaly in the feeding that has since cleared itself?).

There is a fixed time at which each compression event occurs wrt the feeding (all of which are dependent on the actual operating rate of the press). Likewise for the ejection and "sorting" events.

Any of which can be treated as a HRT activity -- if you don't capture the compression/ejection forces when they happen, you might as well not try (since the tablet has already moved on in the process). But, is the system BROKEN if it fails to capture all of these events? That depends. :>

The first compression event is often a partial compression... "tamping" in some lexicons. Its role is to help remove air which might otherwise become trapped inside the tablet causing lamination and/or capping (the tablet's top pops off when it is ejected from the press). Can you learn something about *this* tablet (as well as the tabletting process) by monitoring this "tamping"? Sure! Is it critical to EVERY tablet formulation? No. So, how much do we care about missing that deadline?? Enough to consider the system BROKEN??

The second compression event most often finishes the forming of the tablet. It is used to gauge the actual amount of material ("weight") in the tablet -- which correlates with the amount of "active ingredient(s)" in the tablet. Surely missing *this* event would be A BAD THING. But, how important is it when you are making aspirin with the machine running "flat out"? Perhaps it is easier just to ASSUME it was a "bad" tablet and discard it. After all, shutting down the press for even 1 *minute* will cost you 15,000+ tablets of production (since you don't have that extra minute in your 24 hour day that can make up for it)! If it takes you *ten* minutes to get back into production, that's more than 1000 bottles of 100 tablets.

And what about the ejection force profile? How else can you tell if the tablet "caps" or "laminates" -- other than a visual examination? Or, determine premature wear on the dies ("barreling") to head-off further tool damage? But, if we miss 1, or 10,000 of these events, does it really matter? Are we afraid one of those missed events will show barreling but none of the others will? Ditto for capping/lamination -- if the formulation is prone to these problems, it will appear more than sporadically. Do we have visual inspection "downstream" that will detect delaminated tablets?

What about the sorting? Afterall, that's the last chance we have to do anything about this *particular* tablet! Surely, this is "all-or-nothing" (?). Ah, but what if the default "mechanical" routing of tablets causes them to be discarded instead of accepted? I.e., you have to do something to *save* a tablet from disposal. Then, missing that deadline causes you to erroneously discard a good tablet.

I.e., any of these *could* be regarded as an end-of-world scenario. Or *none* of them.

Wanna know what *the* indisputable "can't be missed" deadline is? When the operator (user) opens one of the many "guards" that safeguard access to the mechanisms -- the press MUST be shutdown. Now. 100.0000% of the time.

How do you handle this? You HARDWIRE the interlocks to the main power relay. I.e., you *don't* trust the system when lives are at stake. Even on a $1,000,000 machine! With 10 tons of force, it will gladly remove the tip of your fingers -- a few HUNDRED TIMES before you get a chance to scream!

Note that the CONSEQUENCES of missed deadlines varies based on the deadline, the product being formulated, the tablet press being controlled, etc. How is that solution "broken"?

No, this is just one of many misunderstandings when it comes to the subject of "real-time". The whole issue of timeliness and the approach that it necessitates is what differentiates it from typical "best effort" systems. Being able to quantify/bound responses (as well determinism) and determine probabilities of meeting those to arrive at an "acceptable" implementation given economic, development, etc. criteria.

Vote

D

D Yuniskis 15 years ago

You can complain about my choice of words ("hope") -- but, despite that, my point is that it doesn't HAVE TO "guarantee" results. Hard deadlines have no value once they have passed. I.e., once the deadline comes STOP WORKING ON IT. The *consequences* of missing that deadline are a different issue entirely (and, by extension, if the consequences can vary, then why "guarantee"?)

Soft deadlines still have (typically diminishing) value to their completion after the deadline. As such, designing in that scenario is harder as you have to decide how/if you should continue to try to "meet the GOAL" (now that the deadline has passed). HRT is easy to do -- add up the numbers, make sure they fit on the timeline *before* the deadline and you are done. Often, a completely static analysis can yield a "working" system -- if you assume all deadlines MUST be met (most RTOS's don't include support for missed deadline handlers! I.e., if your HRT system misses a deadline, does it even KNOW that it did??).

You can do a similar HRT design by assigning probabilities to various event distributions, "service times", etc. and then look at the "expected value" of your success rate in meeting those hard deadlines. If that fits with your design criteria, then you have a "successful system" -- even KNOWING it *will* fail to meet a certain portion of it's HARD deadlines.

Soft real-time is much harder because the answers aren't so cut and dry. *When* do you give up on a particular task/deadline? Time and resources spent on one come at the expense of another. How do you balance the two against each other when neither is "worthless"?

I have several scheduling algorithms in place at a given time. A task identifies which will govern it's access to the processor. E.g., high frequency, periodic, HRT tasks tend to gravitate to the EDF scheduler simply because it is easiest to get deterministic results for a given load. Even the round-robin scheduler is priority and quantum configured on a per-task basis. I.e., this gives a simpler framework for less "structured" tasks. Still other "longer term" tasks can work off resource "set-asides".

The exception handler gives me the equivalent of a "virtual interrupt" on a per task basis. E.g., the task might be blocking, currently. Yet, it will respond to an "exception".

IPC's to the exception handler can be synchronous or asynchronous (depends on the actual "call" (exception)). Currently, I handle these resource exceptions synchronously -- because *I* have written all of the tasks that use resources in this cooperative way and am aware of the cost of "handling" the request. I.e., I wrote each task with an eye on being able to shed resources on the spur of the moment *without* having to do lots of processing.

"I am a good citizen" :>

Currently, the kernel "knows" because I (good citizen) cooperate and inform it when I have relinquished what I *can* relinquish. If I change this interface to an asynchronous one, then the kernel would have to impose a timeout on the task and/or look into other ways of restricting what the task could do while (allegedly) "cleaning up".

Uncooperative tasks make life difficult! That;s the whole point of this :>

No. Right now, the task in question runs *now*. It's LIKE an interrupt. You (the task's author) don't "dawdle". It looks like the kernel is doing the work on your behalf -- though it isn't.

This is why your L4 idea is so interesting -- it takes the effort required to recover the resources out of the hands of the (potentially uncooperative; or, possibly cooperative but INEFFICIENT) task author. It also offers opportunities for optimization as the kernel *knows* what to do with the resources "listed" (saving traps into the kernel from user-land)

Vote

D

D Yuniskis 15 years ago

OK, I've stewed on this "overnight" (overnap?).

The advantages of being able to push all of the mechanism into the kernel (or agent) without involving the task(s) is *huge*. The whole process becomes more deterministic.

The big (huge!) problem seems to be a direct consequence of that, though. Namely, the lack of involvement of the task in selecting which (if any) resources it can most afford to "lose". (I think I can deal with the signaling that would also be required using existing mechanisms).

First, there needs to be a means by which the task can assign relative values to it's "holdings". It must be inexpensive to update these values dynamically -- since they typically will change during execution. If the cost of updating them "gets large", then it starts to interfere with the gains this is intended to provide (note there are undoubtedly bookkeeping costs other than time).

Second, there needs to be a way of assigning (updating) a "current value" to those resources. I.e., citing Robert's Dutch auction... how would the task value these current assets? If such a Dutch auction was in place, then this could be an iterative process -- not likely to happen if the kernel makes one pass at some static data...

It seems that the task (or an agent of the task) must be actively involved in this process -- in order to allow the task to determine criteria appropriate to *it*.

I.e., the kernel provides *mechanism*; the applications provide *policy*.

I think I'll have to stew on it for yet another nap (come morning... time to deal with the trees, now :< )

Vote

S

Stephen Pelc 15 years ago

While cooperative scheduling is unfashionable, you cannot win this argument. The reality that there are many 24/7 apps running cooperative schedulers and apps written by one team is not important. There are times when I cynically believe that software is more part of the fashion industry than any technical industry.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

S

Stefan Reuther 15 years ago

This is where the L4 guys stay on their convenient island saying "hey, we've provided a mechanism, the rest is userland" :-)

Yes. If you want this fine-grained, you'd better make it *very cheap*. For example, one way could be a data structure in user-land with one word per page / memory range containing the current value. As far as I remember, in L4, requesting a page is not necessarily a system call, but you just access it; the page fault is sent to the pager task. So if the pager can take away pages and return them with different content later, you'd also need a way to tell the task that the content is gone. So you'd probably need a user-land "check-and-lock" routine (but you'd need that as well when page unmap is a user-land routine, so that the SIGFREEMEM handler can synchronize with the interrupted user code).

Stefan

Vote

S

Stefan Reuther 15 years ago

Here, my lecture told me that (hard) real-time means designing a system so that this doesn't happen. Because if you miss one deadline, how can you be sure that this was just a small hiccup, and you won't miss the next 500 deadlines as well? By that definition, one would be able to fit a real-time H.264 1080p decoder on an 8051 :-)

Handling missed deadlines would be soft real-time to me. With all its associated problems, like having to convince the customer that the file they've found which misses deadlines on every other frame is the absolute exception, because nobody else puts 20 Mbps video on a floppy disc with 99% fragmentation or something like that.

My favourite design priniciple: never check for an error condition you don't know to handle :-)

There is a proof that given a set of periods and deadlines, once can derive an entirely priority-based scheduling scheme, so that's what the RTOSes known to me offer.

[kernel asks task to free resources]

I'm not sure I understood you correctly (maybe we mean the same thing?), but the problem that immediately comes to my mind is applications that claim to be good citizens, but by intention/bug/sabotage aren't. Something like a heap overwrite error causing it to run into an infinite loop, not finding the page to free.

Stefan

Vote

D

D Yuniskis 15 years ago

I'll concede that working in such an environment requires LOTS more discipline than, for example, a preemptive one -- especially for real-time applications. I know when I work in such environments, I shudder at the inordinate number of "yield()'s" that I do -- until I look at the cost of each and see their non-impact on the overall product.

I think your tongue-in-cheek (?) "fashion" comment may not be far from the mark, though. I am always surprised at how conservative people are in their design practices -- repeating the (partial?) successes of previous projects instead of looking for improvement(s) with each new opportunity (risk aversion?). Sort of like "doing what others seem to be doing" (wearing what others seem to be wearing) for fear of "standing out" (plaid on plaid -- with YELLOW shoes???)

I also think that an early failure with one approach/technique can scare people off from it -- instead of providing an opportunity for a "project post-mortem" to identify what the real "problem(s)" may have been and learning from that for subsequent attempts.

[I heartily recommend making your own ice cream to learn the joys of unexpected discoveries: "Gee, who would have thought that chocolate chips and almond flavoring would taste SO good together!" Ice cream is one of those rare arenas where you can't really go *too* "wrong" that the result isn't salvageable]

Vote

A

Arlet Ottens 15 years ago

A preemptive environment needs a lot of discipline to solve all the critical sections.

I generally try to solve my problems with cooperative scheduling, but without using yield()-like mechanisms. I agree yield()'s are ugly, and usually a sign that the problem has been solved the wrong way.

Vote

Applications "buying" resources

Join the Discussion

Didn't find your answer?