Resource revocation

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi,

What's the current "best practices" regarding asynchronous
notifications (in a multithreaded environment)?

I have a system wherein "tasks" (omit a formal definition)
request resources from a service that meters out their use;
waiting until the resource has been granted to them
"officially" (in some cases, this is all trust based).

When done, they surrender the resource to the service where
it can be reused by other consumers.

But, there are times when the service must revoke a granted
use of a particular resource.  In some cases, it "asks" for
the resource back (giving the current consumer time to
tidy up before releasing it).  In other cases, it just *seizes*
the resource -- and notifies the consumer after-the-fact.

Presently, I use signals to notify the consumer when this
sort of thing is happening.

But, my personal experience is such that folks have problems
writing this sort of code.  *Remembering* that they have to
register a handler for the signal; remembering that said
handler can be invoked at any time (including immediately
after it has been registered); etc.

Is there a new "safer" way of implementing these types of
notifications?

Thx,
--don

Re: Resource revocation
On 7/25/13 3:23 PM, Don Y wrote:
Quoted text here. Click to load it

My first thought about your seizing mechanism is that you are going to
need to either kill the task you granted the permission to, or tolerate
it using the resource after you have seized it (perhaps with some error
occurring on the use). The problem being that there will always be a
point in time between it testing that is still has the right to use it,
and the actual access, unless of course you need to fill your code with
critical sections for EVERY use of the resource.

Re: Resource revocation
Hi Richard,

On 7/25/2013 7:42 PM, Richard Damon wrote:
Quoted text here. Click to load it


Or, let the task *think* it is still using the resource even though
its actions on/with that resource aren't having the effects the
task *thinks* they are having!

I.e., it depends on what that "resource" is and how it is accessed.

If, for example, it is a (shared) communication channel, the task
can *think* it still has (exclusive?) use of the channel but the
mechanism that actually pushes messages onto/from the channel is
actually silently discarding everything to/from the task when it
no longer *owns* the resources (while the actions of the NEW owner
are now proceeding properly).

If the resource is a piece of virtual memory, the OS can allow
accesses to continue (without faulting) and just ignore all the
writes attempted and return garbage for reads.

Or, these "uses while not currently owned" could result in errors
reported to the user -- that may or may not be expressly indicative
of the fact that the resource is no longer owned.  E.g., "read failure"
and the task scratches its head wondering if the medium is faulty
or <whatever>.

There's also the (un-illustrated) example of the task actually
being *allowed* to continue using the resource under the assumption
that it will, in fact, "soon"  honor the *request* that it release
the resource.  ("Gimme a minute...")

Remember, a "resource" is anything that the system *decides* is a
resource.  and, asynchronous notifications can originate for a
variety of *other* reasons in addition to resource revocation.

The "resource" that prompted my question is an abstract resource
with very loose constraints -- and, no real downside to having it
revoked "in use".  I was coding an algorithm for an irrigation
"zone" wherein the process responsible for the zone *requests*
a certain "water flow rate" (water being the scarce resource and
it is infinitely divisible).

The task can't begin monitoring the amount of water dispensed
for its needs until it knows that water *is* being dispensed
for its needs.  ("OK, 1 gallon per minute so I need to wait
14.34 minutes to ensure the required 14.34 gallons are dispensed.
THEN, I can turn off the water and let the system use it for
some other purpose.  Shower, anyone??")

Delays in acquiring the resource have consequences (i.e., the
task doesn't just want to *block* awaiting it) since an indefinite
delay means the zone never gets serviced (things die, etc.).  So,
the task (or a surrogate operating on its behalf) needs to be
able to watch (and worry!) when a request is lingering, unfulfilled.

Similarly, if something "more important" needs that resource, the
task needs to know that it has been "reappropriated" and take
remedial actions ("Hmmm... I was able to dispense 6.2 gallons.
*If* I can reacquire the resource, soon, I can just dispense
another 8.14 gallons and I'm golden!  OTOH, if I have to wait
hours or days to reacquire it, I may have to start over again.
*Or*, signal a failure as the plants relying on that water
have probably died from dehydration!")

In other cases, the resource may be a computational one.  I.e.,
having access to CPU time/memory on another node.  If that resource
is revoked, the workload manager has to find some other node to
satisfy the request *and* figure out what portion of the operation
previously scheduled on that node must be recreated, etc.

[My point here is:  different tasks tend to need different recovery
strategies.]

Quoted text here. Click to load it

Exactly!  Hence the need for an asynchronous notification mechanism.
E.g. a "signal".  So, when the signal is sent, a thread processes
that notification *before* the task actually is allowed to execute
another instruction (that handler could kill the task, suspend it
until the handler can reacquire ownership of the resource, or
set a flag that the task can examine at some convenient point in
its process, etc.).

I.e., the remedy tends to be defined by the use (UNuse)?

Quoted text here. Click to load it

--don

Re: Resource revocation
Quoted text here. Click to load it

This type of program typically doesn't compute very much.  It's either
acting on some message, or sleeping til the next message arrives.

I think overall it's preferable to not confuse the issue by moving stuff
around between processes without the processes knowing.  The resource
should be under control of one process, and relinquished by 1) sending a
message asking for the process to give it back; or 2) killing the
process, preferably with automatic cleanup actions when the process
dies.

Quoted text here. Click to load it

OK, the usual sense of signals that I thought was reflected in your code
sample, is basically delivering a simulated hardware interrupt to a
running task, so it needs locks, critical sections and all that messy
stuff.  

Quoted text here. Click to load it

In the case of your lawn sprinkler application I think that is fine.
IMHO in this day and age, it's only worth dealing with low-level
approaches if you're doing hard-real-time or have to run on 10-cent
processors or something like that.

Quoted text here. Click to load it

Yeah, the way I'm imagining, I wouldn't do it that way, as described
above.

Quoted text here. Click to load it

In this case I'd say just kill the task, so it can restart in a
completely known state.  Admittedly I am somewhat under the influence of
Erlang right now, and this is a core tenet of Erlang philosophy.

Quoted text here. Click to load it

The task would periodically post updates saying how far it has gotten
(how much water has been dispensed, or whatever).  When it's killed and
restarts, it can take up where it left off.

Quoted text here. Click to load it

I don't understand this example--what would the "resource" be?  In
general terms I'd say kill the process and let the crash handler park
the tool in a safe position.  But in this machining example, I'm
imagining some kind of low level PID loop that would keep checking a
flag to know if it had to bail out.  In either case, the idea is to get
to a place where you can restart later.

Quoted text here. Click to load it

The only reason for such "rope" is to push the limits of the hardware
because more modular approaches are too slow or whatever.  Computers are
ridiculously powerful these days, so unless you're doing something
extremely demanding (basically something that would have been impossible
or economically unfeasible 10 years ago), seeking "rope" is probably a
sign of doing something wrong.

Quoted text here. Click to load it

Right, they are messy and it's preferable to avoid them.  E.g. by using
message passing instead of signals.

Quoted text here. Click to load it

You should probably look into model-checking tools if you absolutely
have to pursue this approach.  Dawson Engler's papers on using such
tools to find crash bugs in Unix file systems might be of interest.

Actually Tom Hawkins' "ImProve" program might be of some use to check
that you got all your watering stuff right, in terms of turning correct
combinations of valves on and off etc., if you're interested in
experimenting with high-tech approaches.  I haven't used it but have
been interested in it for a while:

   https://github.com/tomahawkins/improve/wiki/ImProve

I did play around with Atom (a hard realtime DSL written by the same
guy) and I think the approach is pretty powerful.

Quoted text here. Click to load it

Even someone capable of running with scissors without stabbing himself
every time shouldn't do it outside of some dire emergency.  

In this watering application you have (presumably) rather loose timing
constraints, and roughly unlimited CPU resources.  So I think you can do
fine using safe, simple methods instead of running with scissors.

Re: Resource revocation
Hi Paul,

On 7/25/2013 11:37 PM, Paul Rubin wrote:
Quoted text here. Click to load it

Actually, the irrigation program "computes" a fair amount, given
how "slow" it is expected to operate.  E.g., it has to identify
the "plants" ("water consumers") that are serviced by its zone;
identify their individual water needs; identify the "emitters"
that service their respective root systems and the flow rates
associated with each of those emitters; track the water available
to them "recently" (rainfall, water from other nearby irrigation
zones that happen to overlap their root systems, supplemental water
"injected" by the user manually; etc.); the amount of sunshine
falling on them (some might be shaded during some seasons while
others are in "full/reflected sun") as well as the desicating (sp?)
effects of the wind (again, noting the individual "exposure" to
wind from particular directions;  etc.

And, it has to continue to update these data while waiting for
the "water resource".  Or, waiting for its *return* (if it has
been revoked).  Along the way, it may have to escalate its
request as the *hard* deadline approaches.  ("Hey, if you don't
let me water these things soon, they will die -- in which case,
there is no point in my continuing to execute as a task!")

[I.e., this is a soft realtime problem layered inside a hard realtime
shell.  There *is* a point where a missed dealine results in a failure]

Quoted text here. Click to load it

The question becomes one of whether you inform the process *before*
you take action (or, even let the process itself "relinquish" the
resource) or, if you inform the process after-the-fact.  (or, if you
just kill the process and don't even worry about informing it!  :> )

Quoted text here. Click to load it

If you "request" the process to relinquish the resource, then the
system (i.e., all other consumers of that resource) are at the
mercy of the developer who coded that application.  If he fails to
relinquish it (perhaps even failing to notice the notification!)
or intentionally delays relinquishing it (like a youngster trying
to postpone bed-time), then other consumers suffer.

I.e., if everyone adopts that sort of attitude, then you've got
a sluggish system.

And, you *still* would need a kill switch so a stubborn consumer
could be forced to relinquish the resource, regardless of his wishes.

[Note this all ignores the timeliness issues involved.  How *quickly*
must a task relinquish a resource when commanded?  What happens if the
task isn't a high enough priority to even be granted a significant
slice of the CPU to process that request?]

I've taken the other approach.  A process owns (permanently?) the
resources.  It then doles them out, on request, to other consumers.
When it wants/needs to give the resource to another consumer, it
does so -- and notifies the previous owner that it has LOST the
resource.  (of course, it can also *request* a current consumer
to release a resource... but, it has to be capable of withdrawing
them in the presence of uncooperative consumers!)

This allows me to ensure "policy" over how a resource is managed
is centralized in one place:  the (permanent) "owner" of the resource.

Ponder:

We don't notify a task that we are going to take the *CPU* away from
it (timeslice) and expect the task to respond, "OK, you can have it".
Instead, we just *take* the processor and give it to <whatever>
*we* (scheduler) decide is the most important use for that resource.
There are no guarantees that the interrupted task will ever regain
the CPU.  Nor any notification that it has *lost* the CPU!

Yet, this is something we are comfortable with...

Quoted text here. Click to load it

Yes, the signal *is* delivered as a simulated hardware interrupt
(targeted towards that task).  But, it is passed to the "task"
as a message from <whatever> task (the one who raises the signal)
*through* the kernel and to the scheduler as it (eventually) prepares
to resume that signalled task.  (I.e., I need to be able to raise
a signal on one physical processor and have the task that it affects
reside on *another* physical processor).

Quoted text here. Click to load it

I chose the irrigation example because it avoids the issues of
timescale.  So, we're not distracted by the efficiency of the
delivery mechanism, etc.

But, just because its "slow" and "computationally simple" (compared
to rendering video), that doesn't make it any less of a concern.
E.g., if there is only a few KIPs of spare capacity in the processor
(since processors do more than just control water valves), then
this can be just as constrained as trying to implement a mouse in
200 bytes of FLASH...

Quoted text here. Click to load it

I had a friend who coded like that.  Spawn hundreds of processes...
then, kill of the ones he decided weren;t important.  :-/

Quoted text here. Click to load it

Yes.  All of these approaches are just juggling "responsibilities".
E.g., in my case, a task only checkpoint when it knows the resource
has been revoked *and* the nature of the task requires remembering
state (vs. simply restarting from scratch).  If you require the
task to *periodically* checkpoint itself, then it has to come to
some sort of balance between spending all of its time checkpointing
(so it has very fine-grained resumption capability) vs. very little
checkpointing (so it doesn't waste its time keeping track of where
it was).

[Recall, the checkpointing must be done in a medium that is more
persistent than the task's context -- since the task's execution
environment can be torn down (completely) at any time.  So, now
the system must provide a service for this -- and, one that is
sufficiently lightweight that invoking it OFTEN doesn't affect
performance... or, the performance of other tasks.]

(remember, you don't necessarily have a big disk sitting there
or scads of RAM... how much is a task granted access to?  What
if it requires *more* to preserve its "significant" state?)

Quoted text here. Click to load it

Maybe a cutting tool.  Maybe a power source because the next operation
puts significant demands on the available power supply (which is
shared by other machines in the facility).  Maybe a coolant system
(what happens if you withdraw the coolant before it has had a
chance to achieve its intended goal... is the "piece" ruined?)

The point is, you may not be able to "resume" the operation.
You've just made an expensive piece of "scrap".  And, even if
this is unavoidable, you have to *know* that it is scrap and
must be disposed of... not "resumed".

Quoted text here. Click to load it

Dealing with consumer markets, *everything* is economically unfeasible!
(unless you are catering to consumers who are not cost conscious).
E.g., I would imagine most irrigation controllers are implemented
with little PICs -- because their approach to the problem is much
more naive:  turn the water on for X minutes, then advance to the
next zone.  They don't look at the *needs* of their "consumers"
(plants/landscape).  Nor do they worry about the availability of the
resource they are using.  I.e., only a single zone is active at a time
(often, though not necessarily) and they assume someone else has
ensured an adequate supply *to* the valve manifold.

Here, for example, if I turned on all of the irrigation valves
simultaneously, several things would happen:
- household water pressure would drop noticeably
- the implied flow rates of each irrigation "emitter" would not
   be correct (because the water pressure in the irrigation
   system would have fallen below nominal)
- some of the irrigation loads probably wouldn't "work" at all
   (i.e., not having enough static head to meet the required rise)

But, no particular "zone" should have to worry about this.  It's
a system constraint.  One that should be enforced by whatever
doles out the "water resource".  If someone decides to "run a bath",
the individual irrigation zones shouldn't need to know that their
water use will interfere with that activity.  (OTOH, something
"higher" in the system design should be able to enforce that
policy on the irrigation system)

Quoted text here. Click to load it

But messages only "exist" when they are being examined.  If, for
example, you issue a query to the database, you either assume the
query happens "fast enough" (whatever that means in your application)
*or* spawn a separate thread to process that query so you can
keep watching for messages.  You now have yet another synchronization
issue to contend with, etc.

Quoted text here. Click to load it

I've avoided the "combinations" issue entirely (at least in the
"zone controller tasks").  An individual zone deals only with the
needs of its "consumers" (plants).  It knows that it may not have
access to the resources that it requires at all times (water,
"information", etc.)  So, it knows how to deal with these deficits
on its own.

Similarly, the "water controller" only has to worry about the needs
of *its* consumers (several of which are the individual irrigation
zone controllers).  And, how it can recover from those cases where
it can not supply the needs of those consumers (loss of water
supply, master valve failure, etc.)

Quoted text here. Click to load it

It boils down to what you consider an "emergency".  And, how much
"spare capacity" you have available.  E.g., if you can afford to
*walk* from place to place with those scissors, then there is
no need to take on the risk of running!

OTOH, if you don't have the luxury of being able to take a leisurely
stroll with them, then you either *break* (fail to meet your goals)
or you learn to run, safely!

Quoted text here. Click to load it

Again, irrigation is a trivial example.  Imagine, instead, the
resources are physical processors and you are rendering video
in real time for distribution over the network.  If one of those
resources becomes unavailable (crashes or is repurposed to put out
something more important than watching TV), how much spare
capacity do you design into the system so you can *leisurely*
go about recovering?  (Remember, if the next frame isn't there
in a few tens of milliseconds, the user will perceive a visual
artifact in the rendered image!)

I.e., I would like to find *one* way of dealing with this sort
of thing instead of one way for "fast" things and another for
"slow" things, etc. (because that leaves "fast" and "slow" up
to debate and requires developers to appreciate the differences)

--don

Re: Resource revocation
Quoted text here. Click to load it
That all seems like a trivial amount of computation.

Quoted text here. Click to load it

Hard deadlines in realtime programming usually mean microseconds or so.
This plant watering stuff isn't even soft realtime (where you generally
want responses within milliseconds but are allowed to miss
occasionally).

Quoted text here. Click to load it

Sounds like you might want a two-step approach: 1) ask the process to
give the resource back; 2) if that doesn't work within a reasonable
timeout, kill the process.  I'd really stay away from this notion of
reassigning the resource while the process is trying to use it and
doesn't know what has happened.

Quoted text here. Click to load it

If you have hostile applications in your system, you have a completely
different design problem.  In a normal system, just have a few safety
timeouts and kill processes that miss them.

Quoted text here. Click to load it

That is fine.  

Quoted text here. Click to load it

I think I'd use an extra protocol step so that the process can give back
the resource gracefully.  If you've already decided you want to do
something different (and in my opinion more error-prone), I don't
understand why you're asking for advice.

Quoted text here. Click to load it

That is different.  Tasks that get preempted in normal OS's usually
don't even know that anything has happened.

Quoted text here. Click to load it

I don't think they even make processors slow enough for a few KIPs to
matter for something like this.

Quoted text here. Click to load it

Processes in Erlang are very cheap, and having 1000's of them is no big
deal.  On other systems they may cost more.

Quoted text here. Click to load it

IIRC one of the first questions I asked you was what OS you were using,
what language, etc.  I've run Python on boards with as little as 64 meg
of ram and Erlang's requirements are comparable.  Of course 64MB was a
lot of memory not that long ago, and I tossed out that number casually
just to make you jump ;-).

Quoted text here. Click to load it

If you can't, then you can't.  The best you can do is to try to plan the
operation to be resumable, if you think you might have to preempt it.

Quoted text here. Click to load it

You were the guy trying to make patch panels on laser printers and buy
solenoid valves from Wal-mart or something?  I don't think you're in a
consumer market making millions of devices.  In your situation for a
one-off thing I'd use something like a Beagleboard.  If you're making
millions of devices and have to squeeze pennies out of the hardware
cost, you do that by spending far more up front on software development.
But, even then, message passing is a reasonable approach.  Traditional
Forth multitaskers use a few hundred bytes and are bloody fast and can
run fine on a Cortex M0 or equivalent.  If you're using some simple C
RTOS then it's probably comparable.

Quoted text here. Click to load it

Yes, in a hard realtime system you can't necessarily use that approach
and you may end up havint to resort to something much more difficult.
In soft realtime with these relaxed requirements, you can use simpler
methods without much trouble.

Quoted text here. Click to load it

Just like in everyday life, dealing with "slow" things is often (at
least up front) much cheaper and less hassle than "fast" things.
Consider mailing an envelope somewhere vs. paying Federal Express for
same day delivery.  Or using an off-the-shelf CPU board instead of
making an ASIC.  One of the consequences of cheap powerful SBC's
(Raspberry Pi etc) is that you can use relatively resource hungry
programming approaches to drastically shorten development effort (and
therefore decrease cost), even for relatively low-budget embedded
projects.  You are freed from a lot of constraints.

It's completely sensible for the most economical programming techniques
to be different than what you'd do in a resource constrained system,
just as those techniques are different again than what you'd do in
hardware (or Verilog).  Obviously you can use constrained methods on big
processors, so that lets you use the same approach everywhere, but it
means you do a lot of unnecessary work.

Re: Resource revocation
Hi Paul,

On 7/26/2013 2:48 AM, Paul Rubin wrote:
Quoted text here. Click to load it

In terms of "number crunching", it's trivial.  A four function
calculator would suffice.

But, in terms of processor cycles, there's a lot more than meets
the eye.  E.g., querying the database requires issuing the RPC
for the actual "SELECT", concurrently setting a timer to ensure
the task doesn't "wait too long" for the reply; then, parsing
the reply to examine each emitter, it's flow rate, the permeability
of the soil around it so you know what the water *previously*
dispensed there has "done" in the time since it was dispensed;
how much the plant's root systems will have taken up, how stressed
*that* plant has been (wind/sun) in the days (?) since it was last
watered so you understand *its* needs (and how close its effective
HARD deadline is); meanwhile, querying any other emitters (possibly
serviced by other zones) that have added to the moisture content
in that area; then, looking at all of the plants and making a decision
as to how critical the provision of water from *this* zone is at
this time -- along with how large a "dose".

[BTW, this is what industrial commercial systems do to varying degrees]

[Remember, conditions are *always* changing.  You can't just make
a decision and sit on it until you think it is time to act on it
(which is what a naive controller does)]

Just moving messages (RPC's) up and down through the network stack
consumes more resources than a "conventional" irrigation controller
would in a *week*!

By contrast, a PIC-based controller does:

while (FOREVER) {
   sleep(water_interval - water_time)   // typically days!
   valve(X, ON);
   sleep(water_time)                    // typically minutes
   valve(X, OFF);
}

Quoted text here. Click to load it

No.  This is a common misconception.

"Hard" and "soft" have ABSOLUTELY NO BEARING ON THE MAGNITUDE OF THE
TIMES INVOLVED!

Rather, they are concerned with the shape of the value function
associated with the deadline.  HRT problems have a value function
that "goes to zero" at the deadline.  I.e., missing a HRT deadline
means you might as well reset the processor and start working on
something else -- there is no value left to continued work (i.e.,
expenditure of resource) towards the goal.

[NON-realtime problems have no "deadlines"]

By contrast, an SRT problem has a value function that decreases
at and after the deadline.  I.e., there is more value to getting
it done BEFORE the deadline -- though there may still be value
to getting it done *after* the deadline has passed!  (Of course,
most SRT problems are encapsulated within a "final", HARD deadline
beyond which their value is inconsequential).

[Note a *system* can contain hard and soft real-time problems.]

How far in the future a deadline is -- or, how *often* it is -- has
no bearing on the HARD vs. SOFT distinction.  Sending a probe to
Pluto could have a deadline *years* in the future.  Does that make
it "soft"?  Even an *abacus* onboard the spacecraft could process
innumerable "instructions" in that time period!  But, when it comes
time for a maneuvering thruster to be engaged, it had *better* be
engaged (else the spacecraft misses its orbital insertion, etc.)

Similarly, events from a mouse "wheel" can come at tens of Hz.
Yet, if you miss 80% of them *completely*... <shrug>.  Or, if you
handle them *late*, it could still be acceptable.

If you don't dispense water for the plants in a given zone *exactly*
when you would like to, it's not the end of the world.  For whatever
reason (too busy computing pi?  water resource not yet available?),
a SOFT deadline is missed.  But, there is still value to supplying
water -- whether that's a few minutes later or a few hours!  (i.e.,
the shape of the value function depends on the needs of the particular
plants being serviced, their recent watering history, environmental
factors and the value of the plants themselves!  It's relatively
easy to regrow wildflowers; considerably harder to regrow a fruit
tree -- or ensure it doesn't shed it's blossoms and, thus, lose
an entire crop of fruit for this growing season!)

OTOH, there comes a point where you've simply waited too long and
any attempt at watering is going to yield no results (or, even
NEGATIVE results!).  This is the HARD deadline that represents
sink or swim -- beyond which it is silly to even compute how
late you are!

If the cacti in the side yard don't get watered at 5PM today, they
won't mind if it happens 5PM a *week* from today!  OTOH, if the rose
bushes aren't watered 8twice* a day, they are toast!  (unless, of
course, it is winter time in which case they should be watered
very INfrequently lest the roots rot!)

Quoted text here. Click to load it

Again, how a process is coded can vary with the consequences of
that implementation.  I.e., deliver a signal and the process
can be prevented from doing *anything* in the absence of the
revoked resource.  The signal can even cause a "message" to
be delivered to the task saying "please release the resource, NOW!".

On the other hand, if you rely on "cooperation", then you have
to qualify this cooperation as well as quantify it.  I.e., when
requested, you *must* relinquish the resource within time T.
Even if your task has insufficient priority to claim use of
the CPU in that period!

Quoted text here. Click to load it

Many systems have to tolerate potentially hostile processes.
Esp if they are designed to be "open".  What's to prevent an
"app" in your smartphone from taking a resource and holding onto it
indefinitely?  Or, an application on your PC?  What do you
do in the absence of a human presence to intervene and "kill"
the offending task?  What do other tasks do *while* this
condition persists?

(What do you do when there is no "human" available to kick your
system back into operation?)

Quoted text here. Click to load it

I was actually hoping for a mechanism that more intuitively
allowed the developer to *see* this "event" without the explicit
coding that, e.g., signals require.

E.g., if you are talking of a single resource (within an app),
then:

handle = await_resource(resource_sought)
spawn(use_resource)
result = await_release(handle)
if (result == I_released_it_when_I_was_finished)
    // success
else
    // use_resource was not able to complete as expected

is a more robust coding style.  Something that can easily be
applied boilerplate style.  Get what you need.  Hve something
do the work while you wait for it to "finish".  Then, verify
that it's "finishing" was truly the result of the task
completing as expected vs. something else causing it to terminate.

However, it falls down (gets ugly) when "use_resource" must
then request some *other* resource, etc.

Quoted text here. Click to load it

How's that different from pulling a resource out from under the
eyes of a task?  The task doesn't know anything has happened.  Or,
I could conceivably block the task until I was able to restore
the resource to it!

I.e., we treat "CPU time" as a *different* sort of resource...
And, don't seem to have any problem with that.

Simmilarly, we treat physical memory as a different resource
(in a VM system) than "logical" memory.  We ignore the fact that
accessing location X might be nearly instantaneous while X+1 may
take milliseconds to access (if the page containing it's backing
store has to be swapped in)

Quoted text here. Click to load it

You are assuming the processor is *only* working on this task.
Or, are accustomer to applications/systems where the system idle
loop *always* gets a chance to run (i.e., when the processor is
never overloaded)

Quoted text here. Click to load it

Yup.  In my friend's case, a task (process an inappropriate term)
was a handful of bytes!  Hence my caution in using terms like
threads, processes, tasks, etc.  I've deployed systems where an
"execution unit" was as small as a couple of bytes and as large
as many megabytes.  So, what's "normal"?

But, "cheap" is a relative term.  1000's of processes on a machine
with limited resources can be impossible.  I.e., C.A.E doesn't
presume you're running on a desktop, etc.  How many tasks are
running inside your mouse?  :-/

Quoted text here. Click to load it

Scale that back by an order of magnitude or two!  :>  Think
"SoC" not "SBC".  Think "several dollars" vs. "dozens of dollars".
And, think scores of machines and not singletons.

Quoted text here. Click to load it

Why do you assume that because I want to have a way for folks
with shallow pockets to ALSO take advantage of a technology
that these are the *only* people who will take advantage of that
technology?

Why do people build MythTV boxes?  Don't their cable providers
offer DVR's?  Surely, it's got to be cheaper to buy/rent a DVR
THAT YOU KNOW WILL WORK than to tinker with trying to get
some bits of code to work on a machine you've thrown together
from scraps!  Even assuming your time is "free", you prsumably
would want the resulting device to *work*, reliably!  ("Dang!
My MythTV box didn't record the programs I wanted.  I guess
my time server was screwed up and it didn't know that today
was Friday...")

Similarly, should video recording technology ONLY be available to
people who want to hack it together from odds and ends?  So, if
you aren't technically literate (and motivated!), you can't
take advantage of that technology?  Regardless of how deep your
pockets are?

Rather, wouldn't you want a solution that folks with money (and
no time, inclination, etc.) could purchase (subsidize!) while
also providing a means by which folks with more *time* (and
technical expertise) than money (or, perhaps, more *desire*)
can also avail themselves of that technology?

[Well, I have no idea what you would want.  But, *I* would
want a solution that can be approached from each of these
perspectives]

E.g., I designed my network speakers so you could implement them
with a bunch of surplus PC's -- one for the server and one for
each speaker/speaker-pair.  (assuming you have the space for
a whole PC where you would want one!)

Or, you could buy a bare board and components and assemble one
for yourself -- possibly housing it in a tin can in lieu of a
"real" enclosure!

Or, you could purchase one commercially -- for considerably more
(as there are people wanting to make a profit in that distribution
chain!).

Quoted text here. Click to load it

Exactly.  You don't write your app in Python.  You don't expect it
to have GB of RAM available and GIPs of CPU.  And, you don't create
an environment for others to augment/maintain the design that will
lead to the system, as a whole, being flakey.

Quoted text here. Click to load it

Again, hard and soft have nothing to do with how *fast* something
is.  I.e., how many instruction cycles it will take to execute.
So, how many cycles something takes has nothing to do with the
soft/hard-ness of the RT.  Those things only affect the amount
of *resources* available to the application.

Quoted text here. Click to load it

This is fine if *all* you are doing is soft (or fast).  And, if
your "solution" doesn't have to change from one domain to the other
as it evolves (since folks are hesitant to reengineer an entire
application -- prefering, instead, to try to tweak it to death).

E.g., when I originally coded the irrigation controller (different
system), it mimicked the "naive" controllers (electromechanical)
that I had experience with at the time (growing up, no one "watered"
the yard; The Rain did that!).  Then, I replaced the sequential
zone N then zone N+1 approach with a system that allowed multiple
zones to be watered concurrently.  This required coordination
to ensure "too many" zones didn't try to operate simultaneously.
Now, I want to give it more smarts *and* reflect the impact it
can have on other "water consumers" here.  And, since I can't
predict what those uses will be [actually, this is a small lie],
I need to be able to abort a watering cycle and resume it at
some later time.

[The same sort of algorithm can also be applied to controlling
access to other shared resources -- electricity, natural gas,
etc.  Don't let the air conditioner compressor kick in when
you are baking -- especially if you are on a ToU tarrif!]

Quoted text here. Click to load it

I "spend resources" on making products/environments more robust
and/or useable.  E.g., protected memory domains so task A can't
screw with task B's resources (this costs resources which translates
to real money when you have to buy chip A vs chip B).  RTOS's
vs. MTOS's (because MTOS's don't/can't make timeliness guarantees
even if they can be implemented more simply/cheaply).  "Services"
instead of "libraries" (because services can be more universally
applied and controlled).

I.e., the goal is to use "big system" capabilities on "tiny iron".
So you afford the developer with the environment most conducive
to him producing a benevolent/harmless application without incurring
all the cost of twiddling individual bits.

To that end, being able to provide a "template" that guides how
you can craft a "robust" application -- and, what you can expect
*from* the system via that template -- can make these goals much
more attainable.

"Invoke this service in this manner; expect these results."

"Request a resource using this mechanism; expect to handle
these situations/exceptions."

etc.  But, at the same time, protecting the *system* from the
developer's greed/folly!

--don

Re: Resource revocation
Quoted text here. Click to load it

I didn't realize there was an SQL database in this, but if there
is, the computer is big enough that it's all still trivial.

Quoted text here. Click to load it

These same methods work fine in relatively small systems (say a few KB
of ram) if you don't mind using low level languages like C or Forth and
carefully allocating memory by hand, not having memory protection, etc
(see the Mars Rover article).  In the smallest 8-bit cpu's or in
hardware, you may have less flexibility.

Re: Resource revocation
Hi Paul,

On 7/26/2013 9:46 AM, Paul Rubin wrote:
Quoted text here. Click to load it

No, the RDBMS is in *another* machine (note I said "RPC" and
not "IPC" ) elsewhere on the network.  This (and others) is
just a *client*.  As long as you have the resources for the
protocol stack, "software" to issue the query and catch/parse
the result, you can implement such a client on things as tiny as
a PIC (and smaller).  You just can't *do* much in terms of
issuing lots of requests per unit time.  Nor *accumulating*
many results!

But, if you are clever/methodical about it, you can handle
a virtually unlimited number of emitters, "plants", etc. and
come up with *a* number that indicates how much water you
should dispense "now".

Again, "SoC" not "SBC" (i.e., think:  fraction of an MB of FLASH
and tens of KB of RAM -- but nothing beyond that!).  A "single chip"
solution (plus I/O's).

Quoted text here. Click to load it

You don't have to allocate memory by hand -- nor statically
(another common misconception).  Nor do you have to live without
memory protection (though you probably live without backing
store unless you *add* secondary storage to the device).  There
are lots of SoC's nowadays that can give you a full-featured
environment without the *quantity* of resources you might have
in a desktop, etc.

Even with 8 bitters you can give the developer the "feel" of a
big machine.  E.g., I had a Z180-based system (essentially 8
bits with a 1MB address space) where you would be hard pressed
to know you *weren't* writing UN*X code!  (aside from the dreadful
slowness of a ~6MHz, 25+ year-old, 8 bit machine!)

--don

--don

Re: Resource revocation
On 26/07/13 18:21, Don Y wrote:
Quoted text here. Click to load it

I'm probably teaching you to suck eggs, but since this is
a distributed system, has your architecture and design
considered the cases of partial failure?

The classic problems are where:
  - another machine silently stops processing at some level,
    i.e. possibly above the TCP level
  - the network fails, including the subtle failures leading
    what is to all intents and purposes a self-inflicted DoS attack
  - in a high availablility system:
   - the network becomes partitioned leading to duplicate services
   - the network becomes re-joined leading to the problem of deleting
     duplicate services


Re: Resource revocation
Hi Tom,

On 7/26/2013 11:30 AM, Tom Gardner wrote:
Quoted text here. Click to load it

Yes.  And the consequences vary depending on the nature of
the failure.

Quoted text here. Click to load it

Certain services are considered "essential".  Failure of one or
more of them means I'm screwed.  Conceivably, these services
could be replicated for higher availability -- that's for someone
else to worry about!  (have to put design limits *somewhere*!  :> )

E.g., the database service is heavily relied upon by all clients
in the system.  Since everything is diskless, the concept of
persistent store has to be implemented elsewhere.

Rather than "just" implement a "network file service", I opted
to give clients access to *structured* storage.  Why have each
app create its own "file format" and have to implement code to
*parse* (and error check!) that format?  Since most configuration
and control "files" are really "tables" in some form, why not have
a service that can *store* those tables in their logical form?!
And, allow clients to grab individual entities from those tables.

And, allow something with persistent store to keep track of them!

So, a conventional irrigation controller might have a table
like:

ZONE   FREQUENCY    DURATION       COMMENT
  1      3 days     15 minutes    shrubbery
  2      daily      5 minutes     flower beds
  3    twice a day  10 minutes    rose bushes
...

This makes the job of the actual *controller* task pretty simple!
Figuratively:

for (zone = 1; zone < MAX_ZONES; zone++)
    query("SELECT duration, frequency
           FROM irrigation
           WHERE zone = %d", zone);
    parse_result(&duration, &frequency);
    valve(zone, ON);
    sleep(duration);
    valve(zone, OFF);
}

(some hand-waving implied since I am assuming the frequency
criteria has been met)

Additionally, the RDBMS can enforce checks/criteria on the
data stored in it.  E.g., limiting the choices for "frequency"
or "duration".  This code can run *in* the RDBMS instead of
burdening the client/app with that run-time checking!  You
can *assume* the data are meaningful when you get the results
of the query -- no (?) further testing required!

Part of the reason behind the "asynchronous notification"
issue/resource revocation is that you have to be prepared
to deal with a resource (i.e., an entire client node)
"going down" -- making the resources that it "published"
inaccessible to the system.  So, anything using those
resources has to be able to deal with the resource being
unceremoniously revoked AT WILL.

E.g., in the event of a power outage, only some of the nodes
are battery backed. What happens to all the tasks that are
expecting to interact with the hardware and/or software (tasks)
residing on those nodes?

"Please stop using this resource."
"OK"
"No!  I mean it is no longer available!  You *can't* use it!"
"Oh..."

So, why implement *two* mechanisms if you will always HAVE TO
have the "kill" option working?

As to some of your later points:

While the system is distributed, control over it (supervision)
is not.  I.e., if a node (or process) goes down, the workload
manager removes it from the system completely.  If the node
suddenly becomes visible, again, it won't be given any work
until the workload manager formally reintroduces it into the
grid.

Technically, there can be periods where some nodes still have
connectivity with the node and it appears to be functioning.
If so, the results/services that it provides remain useful.
But, once the system sees that it is unresponsive (to the
supervisor or any client), its death knoll has been struck.

--don

Re: Resource revocation
On 26/07/13 19:56, Don Y wrote:
Quoted text here. Click to load it

Fine :)

Quoted text here. Click to load it

I may be over-interpreting the words you have
used, but what would happen if:
  - the controller/manager dies; what do the subset of
    nodes that have decided the controller has failed
    actually do? Would there be a conflict with the other
    subset of nodes?
  - ditto intermittent network connectivity (yes, I have
    seen that even in benign environmental conditions)



Re: Resource revocation
Hi Tom,

On 7/26/2013 12:08 PM, Tom Gardner wrote:

Quoted text here. Click to load it

By "controller" I assume you mean "supervisor" and not the
"irrigation controller" I've been talking about?

If the "supervisor" dies, the shit has hit the proverbial fan.
What do you do if *the* CPU in your single-CPU product catches
fire?  The supervisor is one of those "critical resources".
Potentially, someone could redesign the system to implement
redundancy with a protocol that allows the clients to elect/select
a new master.  But, that's beyond my level of interest.

When/if the supervisor fails, there is nothing to dole out
work to the nodes.  When they finish doing whatever they have been
tasked with doing, they just "generate heat".  I could potentially
add a daemon that automatically powers down idled nodes but I
suspect that would be a rare enough event that it wouldn't buy
me much ("OK, the nodes are now ineffective but at least I'm not
wasting power keeping them UP!"  :-/ )

If you meant (above) the *irrigation controller*, then it depends
on what part of the irrigation controller has died.  E.g., part
of the controller (a *virtual* part!) must reside on the physical
node that is connected to the solenoid valves -- because it needs
to be able to actuate the individual valves as part of "controlling
irrigation".

But, the code that implements each of the zones can reside on
any processor in the system.  Potentially, the workload
manager can dispatch the controller for zone 3 to processor
node 8; zone 4 to node 14; etc.  If one of these dies, the
workload manager can sense this fact (think: keepalives) and
dispatch the task to another, functioning node -- at the same
time, instructing the *physical* irrigation controller node to
ignore communications from that "failed" node that had previously
hosted this zone.

Quoted text here. Click to load it

First, the network is dedicated -- no "otehr" traffic on the
wire.  *Anywhere*.  Second, it is physically secure -- you
can't just plug/unplug something.  It's not like a NoW where
you could conceivably unplug a particular workstation while
it is actively part of the grid.  (The same is true of
power to these satellite nodes, to a large extent).  And,
finally, the network/protocols are hardened.  So, even if you
could inject traffic onto the network, the best you could
hope to do is deny service to *one* node -- the node whose
network drop you have infiltrated.

[This is effectively the same as unplugging the node since
it can no longer communicate with the rest of the system.
For some nodes, they can continue to fulfill their roles
in a fail-secure mode.  E.g, the HVAC controller will
continue to keep the house in the "habitable" zone though
it might not be particularly *comfortable* to occupants
if left in this mode indefinitely.  It wouldn't necessarily
know when to alter the temperature based on occupancy if it
can't *sense* occupancy -- reported by some other node in the
system!  But, at least the pipes won't burst in winter and
the pets won't collapse from heat exhaustion in the summer!]

If the network is unreliable, then something is broke.  You
treat it as any other failure.  I.e., if you had some timing
constraint on when you received the results of a query from
the RDBMS and that constraint was violated, you had some
recovery procedure in place.  (E.g., HVAC system asks RDBMS
what the setpoint temperature should be at this time of day
and gets "no reply".  "OK, I'll play it safe and pick XX.
This might not be ideal in terms of comfort or economy but
its better than a NoOp!")

I can log these sorts of failures.  Unfortunately, I have not
been able to come up with a strategy beyond this!  I.e., what
do you report to the homeowner?  (I'm not as worried about
commercial/industrial deployments because they will typically
have someone skilled in these "physical plant" issues)  What
do you suggest as possible remedies?  Or, do you resort to a
(useless) "Check Engine" light??  :<  ("Yeah, I checked the
engine.  It's still there!")

[I also am completely at a loss regarding how to deal with the
privacy issues involved.  Your *phone* leaks information about
you via the countless apps you install.  Imagine when your
*house* starts leaking information!!  "He went to the bathroom
at 3:27AM.  Watched the following TV shows.  Ate supper at 6:22P.
Slept for 4.5 hours.  etc."  How do you give a "normal Joe"
control over all that information and what apps are allowed to
see and consume?  In a meaningful way that doesn't just have him
blindly consenting to each app's desire to poke at specific
data??  :< ]

--don

Re: Resource revocation
Quoted text here. Click to load it

This really sounds more and more like you're reinventing Erlang.
That's ok, it happens all the time.  You might benefit from:

http://learnyousomeerlang.com/content

FWIW, Erlang has a replicated, distributed database (non-relational,
more like an object db) built into its runtime.

Re: Resource revocation
Hi Paul,

On 7/26/2013 12:31 PM, Paul Rubin wrote:
Quoted text here. Click to load it

When I set out, I looked for a mainstream language that was
reasonably safe and efficient that would act more like a
"scripting" language.  I.e., most of the heavy lifting that
an application would need would be available *to* the application
as system services.  The application would "simply" (ha!) tie
those services together in a meaningful way.

I settled on Limbo as it is sort of a "safer C" -- easy enough
for folks conversant in C to pick up quickly.  But, also including
inherent support for IPC, strong type-checking, support for
concurrency, etc.

And, under Inferno, provides a VM platform that makes things like
pushing a process to another processor relatively painless.  As
well as affording some protection mechanisms for collocated tasks
(and their remote communications!)

There are a few things about the language and environment that
I'm not thrilled with, but, so far, it seems to have been a
good choice.

Quoted text here. Click to load it

I opted for a full-fledged RDBMS (currently, PostgreSQL) as it
lets tasks push a great deal of work back into the server.
E.g., let the server implement the joins, triggers, checks,
etc. instead of requiring the client/app to do all that detail.
(remember, clients are reasonably strapped for resources!)

It also helps ensure *all* clients of a particular DB follow
the same constraints applied to the data *in* each table.
So, some client doesn't add an entry that is incorrect and
screw up some *other* client who expects, e.g., "person.age > 0"
to have been enforced *in* the data!  Otherwise, each client
would have to implement more comprehensive checking on the
data -- and, be capable and consistent in reporting any problems
it encounters *to* the user!  (i.e., "Age must be greater than
zero" reported by one client's tests while another client opts
to say "Not old enough", etc.)

(It also lets me avail myself of advances in the development
of the RDBMS just by "upgrading" my implementation to
"-CURRENT" as appropriate)

I'm happy with the software environment.  But, having more
problems than I anticipated with the instrumentation!  :<

--don

Re: Resource revocation
Quoted text here. Click to load it

Yeah, Erlang is worth understanding even if you don't actually use it.
Some of its principles can be used in other languages.

Quoted text here. Click to load it

Interesting, I didn't realize anyone was using it.  I might look into
it.  One of the authors later developed Go, as you're probably aware.
I've been looking at Go a little bit, and some of my co-workers have
been using it.

Re: Resource revocation
On Sat, 27 Jul 2013 11:08:29 -0700

Quoted text here. Click to load it

Huh.  I did a little bit of looking at Go, but it seemed at a casual
glance like it was pretty ill-suited to embedded work; huge runtime
library that you can't strip down, garbage collectection at
inopportune times, and a lack of cross-compilation targets.

Have you actually managed to use it?  Did it behave?

--  
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Re: Resource revocation
Quoted text here. Click to load it

We use Go where I work, though on servers, not embedded targets.  I know
there's a gcc front end for it, so I'd expect it can be cross compiled.
I think it is potentially ok for 32-bit embedded systems, though maybe
not for ultra low latency hardware operations because of the GC and
multitasking.  The runtime is much smaller than Erlang's.  I'm not sure
about the GC situation.  Someone told me it could run in systems with
1-2MB of memory (maybe including program memory), though I haven't
looked into this.  I don't know how those issues compare with Limbo.

I'd certainly consider Go if I were doing something with embedded Linux.
It might or might not be feasible for Cortex M4-sized targets.  I don't
know enough to definitely rule it out.

Re: Resource revocation
Hi Rob,

On 7/29/2013 9:16 AM, Rob Gaddi wrote:
Quoted text here. Click to load it

Limbo has some of the same (mis?)givings.  E.g., GC is entirely
under the control of the VM.  But, my hope is to just avoid the
*need* for asynchronous sweeps (e.g., avoid things that use
memory that way).

Since it (the only available release that I know of runs only
under Inferno so its hard to talk about one without the other)
runs in a VM, anything that can host the VM can run the
"binaries".  E.g., there was an IE plugin that implemented
the VM so you could run Limbo executables in your (IE) browser.
(I don't think that has been maintained)

Most of the "library" is implemented as loadable "modules".
So, if you aren't using the graphic library, it never gets
loaded (i.e., "from secondary storage")

But, the developers think 1MB is a small machine  :>

(I've been wading through the implementation trying to
carefully weed out features that I don't like/want/need
as well as partitioning it into a ROMmable core with
RAMmable data segment -- since having 1MB of RAM into
which the entire thing can be loaded raises the bar
in terms of deployable hardware).

Quoted text here. Click to load it

*Limbo* has fared well, so far.  Tiny executables
(i.e., I can dispatch the code for a task/job over
the network in a small fraction of a second and have
it loaded and running "instantly")

But, I've cheated, in a sense, by making so many things
available as *services* so applications don't have to
waste time/space reinventing the wheel everywhere...

(IIRC, the Inferno binaries -- and sources -- are available
for "free" download.  I just don't have a pointer handy...)

Re: Resource revocation
On 26.07.2013 11:48, Paul Rubin wrote:

Quoted text here. Click to load it

This is a very common misconception, but still just as wrong.  Realtime  
has nothing to do whatsoever with the length of any particular interval  
of time.  It doesn't matter if your deadline is coming every 50  
nanoseconds or once a year.  If there's a deadline, and it's defined as  
a point fixed in time, then you're doing realtime processing.

Nor is the distinction between soft and hard realtime to be found in the  
timescales involved, but in the gravity of consequences if you miss a  
deadline.

In other words, realtime is about whether there _is_ a "too late", not  
_when_ that might be.


Site Timeline