Forewarning of resource inadequacies

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
*IN GENERAL*, what dogma would you suggest regarding addressing
potential ("current") resource inadequacies when starting a task
(or, offering that capability *to* start that task to a user)?
Keep in mind that resource availability varies before, during and
after a potential task/activity is initiated.  And, a particular
activity may result in other activities (automatically started
or likely to *want* to be started -- by the user).

E.g., if you were asked to copy a file, you'd probably stat() the
file to ascertain its size (*hope* that size remains constant
during the following operation) and check to be sure you have that
much free-space on the target.  I.e., the copy operation would
tend to be fast enough that the user couldn't remedy a "no space
on device" error before it was signaled.

OTOH, if the transport medium was of sufficiently low bandwidth,
you could allow the operation to start and warn the user that
he/she WILL run out of space at the target *if* nothing changes
(the implication being that the user should be that agent of change).

The 1960's approach, of course, is just to naively start the operation
and then have it abend when it hits that brick wall!

Note that each action that the user takes implicitly consumes resources
and, as such, can hinder other actions that he/she may want to take.

And, that your device can also have autonomous resource needs that
are incurred alongside the user's actions (e.g., daemons).

Do you, for example, let the user consume battery in a futile attempt
to perform some operation -- and end up jeopardizing his ability to
do some more valuable operation later (e.g., back up his device before
power fails)?

Again, these are only examples.  The question is what criteria do you use
for alerting (and/or inhibiting!) the user when you know that it is likely
that he won't be able to perform the desired task WITH THE SYSTEM IN ITS
CURRENT STATE -- and *when* do you impose those notifications?

Re: Forewarning of resource inadequacies
Hi Don,

On Thu, 07 Apr 2016 14:10:04 -0700, Don Y

Quoted text here. Click to load it

I suggest you watch the movie "Dogma", think about it for a while, and
then try asking your question again.
http://www.imdb.com/title/tt0120655/

8-)


Quoted text here. Click to load it

That's also the 2016 approach because it's just too complex to figure
out for most cases.

What you essentially are trying to do is predict successful (or not)
completion of a resource limited scheduling with an open ended set of
processes, priorities and resources.  That's an impossible situation:
resource scheduling is a bin packing problem that's solvable only in a
closed system.

Given a known set of resources and a enumeration of the needs of each
process, you can predict completion of any given process.  But process
priorities complicate scheduling and any predictions of success go out
the window the moment a higher priority process enters the mix.

And every process must be able to state the resources it needs to
perform any given operation: e.g., I need xxx KB of RAM, nnn file
buffers, yyy KB on disk Q, etc. ad nauseam.  And if the file system is
remote, implicitly add network connections, etc.

This does _not_ have to be done statically prior to execution [that is
just the simplest case] ... but to be effective the process must
communicate with the scheduler and participate in (re)scheduling
whenever it's resource needs change.

It is complicated when the needs are ad hoc, like with your file copy
where the space on the target device is unknown until the source file
is examined.  The best you can do in such situations is to preempt the
running process until scheduling shows that it can complete, and tell
the user so she can halt the process if necessary.


Quoted text here. Click to load it

Yes.  Either you must defer a new process until scheduling says it can
complete, or you introduce a "higher priority" process.


Quoted text here. Click to load it

Daemon resources should be known and limited.  There's a reason, e.g.,
that only a superuser can take the last process slot.

But if you're trying to do resource scheduling, then every process
must have known limits.


Quoted text here. Click to load it

You fix it so only the backup process can invade the last xx% of the
battery.  Then watch it blow up due to WiFi retries because there is
interference or signal strength is poor because the device is too far
from the AP.


Quoted text here. Click to load it

If you can implement a reasonably effective resource based scheduler,
then you can warn the user that some program can't run now, but will
run when <some set of> currently executing processes finish, and does
the user want to defer it until then?  But as I said previously,
everything goes out the window when a higher priority process enters
the mix.

Even doing that much is incredibly hard.  In an open system, too many
resource needs are ad hoc, and it may not be possible to even
enumerate every resource that *might* be involved.  I.e. the scheduler
may have to consider not only dynamically appearing resources, but new
resources that it didn't even know of yesterday.

And in a distributed system a centralized scheduler will be a
bottleneck, but without it there's no way to predict effects on a
shared resource of processes executing on different nodes.


I know this didn't really help and that you've thought of all or most
of it already.

YMMV,
George

Re: Forewarning of resource inadequacies
Hi George,

On 4/8/2016 4:08 AM, George Neuner wrote:
Quoted text here. Click to load it

<frown>  Doesn't look like the sort of thing in which I'd be interested.
_RED_ and _RED2_ last night.  _Flushed Away_ before that.  Finestkind.

Quoted text here. Click to load it

In an RT system, it is explicitly known for each task -- as indicated
by it's (numerical) deadline.  In my case, the brick wall occurs only
for HRT tasks:  deadline handler kills task and frees all held resources.
(for SRT tasks, deadline handler cooks the books and decides if the task
should be continued "at lesser value").

[Obviously, the goal is to convert HRT tasks into SRT tasks wherever
possible]

Quoted text here. Click to load it

Yes.  But in an RT system, processes (tasks/threads/whatever) have temporal
constraints -- deadlines.  You can exploit these in your scheduling algorithms
to ensure resources are where they should be.

Quoted text here. Click to load it

Yes:  reservations (aka "reserves").  I can ensure "resources" are
"ready and waiting" for tasks that place appropriate reservations.
So, you needn't block indefinitely waiting for a piece of memory
that you need to perform your task (memory in use by some other
can't be forcefully freed without restarting that "other").

OTOH, the "CPU cycles" that you need can be reserved for you WHEN you need
them -- yet freely given to any other (lower priority) tasks that come
along before you do.

Battery power is, of course, related to CPU cycles (if you're burning them,
you're eating battery).

So, reserves can have some impact on physical resources (memory, battery)
required by tasks but minimal impact on other resources.

Quoted text here. Click to load it

The "visible" aspect is what I am trying to address.  I want the user
to be able to know (and accept as "reasonable") how ANY such shortage will be
handled.

Returning to the file copy (it's easy to internalize), think of how
different systems will address this.  E.g., copy a 1G file to a volume with
< 1G free space and the operation isn't even attempted (windows).  OTOH,
copy a *set* of files and it is treated as a (unordered, for all practical
purpose via the GUI) set of individual operations, the first of which
that fails aborts ALL the remaining.  So, an 800M file would copy OK
but the next 300M file wouldn't -- and would prevent the 100M file that
follows it from being copied as well!

I know I've scurried to make space available when doing big transfers
over remote procedures (e.g., FTP) where I can manually delete files
that I've spontaneously decided I could "live without" ($TEMP) in
order to ensure a long transfer completes.  And, I'm sure I could
manually start another process that CONSUMES free space after a file
copy is started -- and upset that previously started operation
("WTF?  There WAS sufficient free space when I started this copy
operation.  But, fwrite() just signaled an error!")

I.e., the user experience isn't consistent.

Quoted text here. Click to load it

But the user interacts with each of those.  So, you've an opportunity to
inform the user of the consequences of his actions (e.g., you are now burning
more CPU cycles so the previous operation will take longer -- but still
complete! OR you are now consuming more memory so the previous operation
may abend!)

Quoted text here. Click to load it

That's where reserves come into play.  The process *will* run as
expected (or, won't and will be handled as per the criteria encoded in
its deadline handler).  I brought this up, here, as a reminder that
a system is rarely static; resources that appear to exist NOW can
magically disappear, later -- without any deliberate action on the
user's part!

Quoted text here. Click to load it

Again, just offered as an example of what can creep into the above
calculus.  (And, handled with reserves, in my case)

Quoted text here. Click to load it

If the higher priority process was there all along (and its reserves
thus known to the system), it can be addressed when the user task is
started.

If the *user* starts a higher priority task, then you have another
opportunity to inform the user that his current actions will impact
his previous actions (or, vice versa).

The trick is not confusing the user:  why won't this run but this
other (nearly) identical thing will?

Quoted text here. Click to load it

Yes.  As I can bring more resources online on-demand, the workload scheduler
always has a changing mix of resources to evaluate.  But, it doesn't need
to track all of the existing jobs running on the various processors.  Rather,
it just sees a set of processors with varying capabilities ("surplus
resources") as likely candidates for the newest workload to be dispatched.
The schedulers on each node then handle the finer grained scheduling of
the resources *on* that node.

[The only magic involved deals with the decision -- by the workload
scheduler -- to bring another node on-line and possibly re-shuffle
the locations of currently running loads.  And, the inverse operation
of moving load off of underutilized nodes so they can be powered down.]

Quoted text here. Click to load it

Clump tasks together based on how much they share (IPC vs RPC, SHM vs DSM,
etc.)  So, any tight coupling is handled within the node and can be
ignored by the workload scheduler (hopefully).  For simple resources
(CPU, battery, memory, etc.) there are no "interactive" sharing but,
rather, just a resource limitation that can't be exceeded.

Quoted text here. Click to load it

"If it was easy..."  :>

Thanks!  A colleague sent me some materials on how they design UI's
at their shop.  Hopefully it will contain some mantras pertinent to this...

Re: Forewarning of resource inadequacies
Hi Don,

Sorry for the delay ... busy weekend.


On Fri, 08 Apr 2016 11:58:28 -0700, Don Y

Quoted text here. Click to load it

The deadline of the RT task is just one of its resource limitations.


Quoted text here. Click to load it

But a task that requires more resources than are available can't run
regardless of having an execution deadline.  The RT aspect of your
system is in some sense a diversion ... time really is just another
resource constraint on task execution.


Quoted text here. Click to load it

The problem is when those lower priority tasks are still running at
the time the higher priority task needs to start.  What if,  given the
needs of the task, there aren't sufficient resources to execute it? Do
you kill some lower priority task?  


Quoted text here. Click to load it

Yes, but that's just a failure of imagination on the part of whoever
designed the copy mechanism.  A group of files can be copied in many
different ways: largest->smallest, smallest->largest, alphabetically
by name, forward directory order, reverse directory order, inode
order, randomly, etc.  There are plenty of copy utilities that provide
more choices in handling.

But again, the copy example is a distraction ... whatever you come up
with has to work for any program and situation.  

There are too many potential interactions in an open system.  Now
you'll argue that the system is closed ... but it isn't.  Your system
is distributed, so only inside a given node can there be a closed
system.  As soon as try to deal with off-node resources ... e.g., the
shared filesystem, sprinkler valves, etc. 8-) ... the resource
scheduling problem becomes exponentially more difficult.

And again, RT is only one aspect of the problem.


Quoted text here. Click to load it

Right. Which is why I said the solution in 2016 is no different from
the solution in 1960.  There are too many variables and too many of
them are hidden.


Quoted text here. Click to load it

Right, but there's more to the issue of subsystems than just that.
They should be suitably limited in their allowed resource use.  

On most systems you can keep opening sockets until you run out of
descriptors.  However if it is known that your system is a server that
can handle a maximum of 1000 connections, there is no reason to allow
more ... doing so invites DOS failures.


Quoted text here. Click to load it

That doesn't work for tasks which execute in response to outside
events.  Such tasks can't easily (or at all) timeshare their reserved
resources with others - the maximums needed must be kept available for
the duration of the task.


Quoted text here. Click to load it

Which is fine for cycles - i.e. "compute" resources - but not
necessarily so great for others.  Does this node have enough memory?
How far (network span) is the node from storage?  Etc.


Quoted text here. Click to load it

Unifying to find for a local minimum on the task set?   Yeesh!!!!

Still the problem is you need fairly complete enumeration of the
maximum resources needed by every task.  That's effectively impossible
... there are too many dynamic and hidden variables.


Quoted text here. Click to load it

The UI is secondary to the guidelines for notifying the user.  I think
meaningful guidelines in your system will necessarily be heuristic and
hard to pin down.

YMMV,
George

Re: Forewarning of resource inadequacies
Hi George,

On 4/10/2016 10:47 PM, George Neuner wrote:
Quoted text here. Click to load it

Ditto.  Big time rain so I took advantage of the softer soil to pull up
the wildflowers.  Tonight making choc covered almonds for the chocoholic.
<frown>  Shes had a rough couple of weeks so I figure worth a few hours
of my time...

Quoted text here. Click to load it

It's the point at which you "hit the brick wall".  I.e., if you haven't
acquired the resources you need by then, you die (if HRT).  So, you
implicitly have a timespan in which to alert the user to any anticipated
difficulties.

Quoted text here. Click to load it

If the resources aren't available when the task is "invoked" (by the user),
then you can tell the user that the task can't be started (made ready).
What you have to be concerned with is resources that can "slip away"
after the task has been accepted for execution.  This poses a UI problem as
this can happen some time after the user's attention has moved away from
the "start task A" activity -- you effectively have to tell him that
something he THOUGHT was going to proceed "OK" some time ago will *not*;
based on new information that you (the system) now have.

Quoted text here. Click to load it

That's where the reserves come into play.  Conceptually, the reserved resources
don't appear as "available".  So, when a task tries to be started, you look
and see that there are insufficient resources and don't let it start.

CPU cycles are a special case -- you CAN make them available to other tasks
(other than the task that has placed the reserve on them) as you can quickly
"steal them back" when the reserving task actually starts.

[This can cause the expected completion time of the "low priority" (for want
of a better name) task to slip further into the future as it will now get
less work done per unit time.  This raises the "how do I tell the user
that his task -- started some time earlier under "better conditions" -- is
now potentially compromised?  Do I complain, NOW?  Or, do I see if conditions
will improve in the future thereby giving it a better chance of meeting
its deadline?]

Memory, OTOH, is a physical resource that can't be "rescinded" when the
task that has reserved it becomes ready.  So, it can't be given to the
"lower priority" task even though it looks like it is PRESENTLY unused.

[An exception is that the memory can be allocated as "anonymous" memory
that can actually benefit the lower priority task(s) -- *if* the system
allocates it (i.e., NOT the task in question) for roles that the system
can effectively rescind.  E.g., the system can allocate it in place of VM
to allow that low priority task to make fewer round trips to the backing
store.  When the task that has reserved the memory comes ready, the
system has to figure out how to free up that memory -- flush to backing
store if not already there *or* just freeing the pages if they are already
available on the backing store.]

Quoted text here. Click to load it

Exactly.  The UI didn't see consistency as a design goal.  Had the copies
all been treated as transactional, then the behavior for a single file
copy (pass/fail) could be identical to the behavior of the  "group"
file copy.

Quoted text here. Click to load it

The workload scheduler looks at the system as a whole.  It knows the resources
available at each node (MIPS, RAM, etc.) at any particular instant.  This
reflects the "in use" resources as well as the resources that have been
reserved on that node (possibly for a task that isn't running anywhere in
the system, *now*).

The scheduler on each node handles finer-grained resource scheduling
for that node -- because it knows what the resource requirements are
for the tasks (and objects) *on* that node.

The workload scheduler can opt to allocate all of the reserves on
nodes that are currently not powered up.  (this is the most efficient
in terms of "hardware in use" criteria)  If a task has to be activated
that requires X resources and those resources are not available *now*
on a running node, then the powered down node on which they have been
(conceptually) "allocated" by the workload scheduler is powered up and
those "real" resources are now available for that task.  The surplus
"real" resources on that node are available for immediate use by any
OTHER task that comes online (or that is already running, somewhere)

[E.g., the workload scheduler can opt to move some task to this node to
allow the tasks executing on that other node to run more effectively]

Quoted text here. Click to load it

But they aren't!  That's the point of the reservations.

You write a program to run on box X under OS Y.  Is it a *surprise* to
you when you actually run the program and see that it needs more
memory, CPU, etc. than box X can provide?  Are you surprised that the
time required to compute the next Fibonacci number is many days?
Are you surprised that stack penetration exceeds the amount of RAM
available in the box?

Of course not!  You've engineered your solution to require resources A, B
and C.  And, you know how much those will "bend" based on the actual
hardware characteristics (i.e., if the clock is slowed down, the amount
of time required will increase; if the physical memory is decreased,
VM requirements will increase -- along with execution time required to
use that VM).

The workload scheduler effectively creates a machine on each physical node
with the requirements of the task in mind.  When the workload scheduler
has allocated all the resources available in the system, then the system
is running at capacity.  Just like a single "program" using all of the
available resources on a physical machine (X) tailored to its needs.

Quoted text here. Click to load it

Yes.  You reserve what you need.  If you don't need it, reserving
it is "unsportsman-like".  Just like allocating 10MB for a buffer
that will be used to store a single float.

"The Market" eventually penalizes your sh*tty implementation -- by
NOT running it!

The system is accommodating in that it WILL let you effectively use
resources that you don't "need" (e.g., surplus memory for backing
store, surplus CPU cycles, etc.) because it makes no sense to
discard (make unavailable) those resources when they *can* be used
when the only cost of doing so lies in system complexity (i.e., the
task doesn't see this cost).

Quoted text here. Click to load it

Yes.  You'd not reserve 1001 sockets if you only need 1000.  And, you'd
KNOW that you *would* have 1000 sockets available to you -- whether you
needed them "up front" or "down the road".

You wouldn't reserve N MIPS-seconds/interval (I'm still struggling to
come up with a unit of measure for "workload") if you only needed half
of that.  (if you need it for half as long, then specify an interval that
is half as long).

With all of these "requirements" laid out, the workload scheduler
can know what "minimum" resources are required at each node -- based
on the tasks that it has dispatched to those nodes.  Anything left
over (on powered UP nodes as well as powered DOWN nodes) is available
for future tasks -- "reserves" plus "surplus"

Quoted text here. Click to load it

Yes.  "Reserves".

If you assume all of the nodes are powered up 24/7 and then visualize them
as a single processor, then that processor's resources are managed (by the
workload scheduler in concert with the individual per-node schedulers)
so that the RESERVED resources for any/all tasks are sitting there, ready
to be used.

In my case, there are delays involved as I may have to bring up a node
and/or move existing tasks around based on some "outside event".
But, these are fixed constants that can be factored into the scheduling
decisions -- just like loading a task from disk, etc.

E.g., if someone approaches the front door, the "doorbell" task needs
to be "made ready".  If not already present on a real node at this
time, then it must be loaded from the persistent store and activated
on *some* node.  If the node on which it should be activated is not
currently powered up, then a node must be powered up before the code
can be installed on the node, etc.  But, all of the resources that
it requires are available *somewhere* in the system -- because they
have been RESERVED.

Note that this can cause some "low priority" task to be swapped out
(if it was using "physical" resources that have been reserved for that
doorbell task).  So, the low priority task now is in danger of not
meeting it's deadline -- in a manner similar to losing processor
share to a higher priority task (in a single CPU system).

There's no way to address requirements that exceed capabilities.
So, the sum of all of the reservations must be compatible with the
resources available in the system as a whole.  In much the same
way that N tasks executing on a single processor are constrained
to use only the resources available on that single processor.

This is something that you know at design time.  E.g., Windows makes
reservations for itself.  If you try to run more "programs" than
the available hardware can support, *windows* still runs but your
programs (none of which have contractual guarantees!) suffer.

Quoted text here. Click to load it

See above.


I disagree.  Don't you know how much resources YOUR programs require?

If you design a microwave oven, do you expect it to signal an "out of
memory" error some months after a consumer has purchased it?

The same applies to "open" systems; when you try to run too many
programs on your desktop, you become disappointed with the performance
of the system and learn not to try to do X, Y and Z at the same time.

The difference is, there is no way for you to tell your PC that
X and Z are important and Y should just get table scraps.  *You* have
to explicitly do that by terminating those things that you
*think* are eating more resources than they deserve.

Quoted text here. Click to load it

I think some generalizations can probably be made.  E.g., "small efforts"
with long deadlines can probably be accepted and allowed to "struggle"
(if needed) for a long time before complaining to the user.  Other
"large efforts" with short deadlines should probably be turned away
if the resources are not available at the time the task is activated.

I think I can also "take notes" as to how things turn out and use that
to learn which tasks might defy the odds.

E.g., diarization and voice characterization is expensive ("large effort").
OTOH, it has no hard deadline -- it can be done "offline" and as a low
priority task (even though it can require a boatload of resources!).  It can
tolerate being swapped out -- even for a really low priority task! -- without
affecting performance.  The only real deadline is the next time that
speaker is encountered (which might be immediately -- or never!)

And, it can also be killed!  That decision can be handled by a "user level"
task.  E.g., if we already have a voice characterization for that person
on hand, we might elect to abandon that effort if some *other* voice needs
to be characterized (that we don't have already).

But, in those cases, the "user" is the system itself (an intelligent agent
acting on its behalf)

Likewise, if the doorbell task's deadline handler sees that the bell
isn't handled before the initial deadline (SRT so the deadline handler
can afford to give the task a "second chance"), the system can decide
the task (and its reserves) should be "on-line" at all times (instead
of loading from persistent store into a recently powered up node).

Ditto for incoming phone calls, etc.

I.e., for places that see lots of visitors but few (phone) callers,
you'd allocate (online) resources differently than places that have
few visitors but lots of callers.

<shrug>  Dunno.  I think I'll just have to make a best (educated) guess
at how to convey this information to the user and hope it's "least
surprising".

_P&tB_ marathon tonight.  Time to rot my brain!  :>

--don


Re: Forewarning of resource inadequacies
On Mon, 11 Apr 2016 02:06:03 -0700, Don Y

Quoted text here. Click to load it

To a point ... the problem is that there are multiple deadlines
involved for a single task: the deadline by which it must finish
(obviously) but also the deadline by which it must have acquired its
resources and *started*.

But when there are multiple resources, and time necessary to acquire
them, you have a situation where:

  - if I can't get memory by T1 ...
  - if I can't get a network connection by T2 ...
  - if I can't do ... by T3 ...
     :

etc. ad nauseam.

Where is the "drop dead" point?


Quoted text here. Click to load it

A reasonable thing to do whenever possible.


Quoted text here. Click to load it

Yes.  The problem though is that you can't necessarily account for all
the resources that may "slip away".  Disk space, network bandwidth,
battery life ... all are things you can't reserve.

Ok, you *can* reserve disk space if you know how much you'll need. But
many programs that produce data don't know up front.

And no, priority token networks do not guarantee that you will get any
specified bandwidth - they only guarantee that you will get access
according to your priority, and then only if the network is still
functional when that time comes.  

And with dynamic nodes, any time the network reconfigures (if it does)
all your access time calculations go out the window.


Quoted text here. Click to load it

Worse, the user may have to clean up a partly finished mess.  Unless
you can undo everything and anything  [e.g., transactional ala IBM's
Quicksilver OS].



Quoted text here. Click to load it

But you have been talking about over-committing resources: e.g., 2
programs each need up to 48KB but there's only 64KB available.  Great
if the 2 programs never need to run simultaneously, but impossible
otherwise.


Quoted text here. Click to load it

That depends greatly on the UI.  E.g., a flurry of console error
messages about things the user started 17 minutes ago may not even
make sense depending on the user's current state of consciousness.

Some kind of graphic where you can represent the task as an object in
a colored heat map would draw more attention to problems.  
Everything green?  Good.


Quoted text here. Click to load it

Right, but what about the high(er) priority task that needs to run now
but wasn't active and so hasn't reserved memory?   Or do you figure
reservations at install and limit concurrent instances?


Quoted text here. Click to load it

Distributed memory.  Need more address space, activate another node.
<grin>


Quoted text here. Click to load it

Yeah, but what shortages exist and what can be done about them are
constantly changing.  You need some way to represent the state of the
whole (distributed) system at once.



Quoted text here. Click to load it

What if no active node can handle the task now?  E.g., too much CPU
crunching.  And what if there's no inactive node that can be brought
up to run it (wrong CPUs, not enough RAM, etc.)


Quoted text here. Click to load it

This kind of distributed "meta-scheduling" is a mess.  Quite a bit of
research - no really good solutions.  Task migration ability doesn't
actually make meta-scheduling easier - in fact it makes it harder as
there are more potential states to consider.

Fortunately, dynamically powering up/down nodes doesn't really change
the complexity if the goal is to minimize the number of CPUs in use.



Quoted text here. Click to load it

How are you reserving sockets (ports, whatever) for priority programs
to use?  How are you reserving bandwidth in your network?  How are you
reserving space on the NAS?  How are you reserving battery charge?

There always are hidden variables in a distributed system.  No matter
how hard you try, there *always* will be something you can't account
for.   You're deluding yourself if you think you can.

Since you're playing with expert systems: a hidden state ANN is able
to take into account unknowns in the system - but it can't identify
the unknowns or tell you which of them is out of whack so that you
could tell a user.

There's no free lunch.


Quoted text here. Click to load it

But it's over-committing "machines" on the nodes.  M1 is active on
nodeA, so M2 can't simultaneously on nodeA.  Both M1 and M2 could run
simultaneously on nodeB, but nodeB is powered down.

So I power up nodeB and start M2.  But do I leave M1 where it is on
nodeA?   It will run faster if I leave it, but running both nodes uses
more power.  It's more efficient if I move M1 to nodeB and power down
nodeA.

Meta-scheduling.


Quoted text here. Click to load it

I'm saying that you can't account for all the requirements.  You only
*think* you can.



Quoted text here. Click to load it

I understand.

That works only if all the CPUs are homogenous.  Your virtual machine
architecture smoothes some of the differences but does not eliminate
them - native programs require their particular environment.  And VM
programs are highly unlikely to be RT (even low latency SRT) unless
the VM JIT compiles - which is just more memory use you can't account
for.


Quoted text here. Click to load it

That isn't a problem.


Quoted text here. Click to load it

Which means reservations (at least many of them) have to be made
statically at install and there can't ever be N+1 instances running
when only N reservations have been made.

Quoted text here. Click to load it

Yes, but your system is "open" in the sense that you want others to
design for it.  You can't expect the same attention to detail from
someone just dabbling in home automation with your system.

You have to expect that other people won't be able to design a new
component so that it plays nicely with everything else.


Quoted text here. Click to load it

No.  Because I am currently working mainly with a JIT compiling VM
implementation of a GC'd language.  It has both kernel and green
threading and in either case the threads are just abstract CPUs with
no way to tell how many resources they are consuming.

Oops ... so are your users programming a VM.

I have an loose idea of how much memory the VM needs to run my program
- it is difficult to bound tightly - and similarly I have some idea of
the space used by the process ...

... but I have no idea what's happening inside the VM when I, e.g.,
open a TCP connection - how much memory or how many CPU cycles are
dedicated to the illusion of the connection as a "file stream".  

I could find out ... by searching through source code for the VM and
its libraries ... but that still wouldn't tell me what Windows or
Linux is doing underneath the VM.  Again, for Linux I could find out
by wading through source code.  For Windows I'm out of luck.

Similarly your users will have only a vague notion of what their
programs cost in terms of the system services they use.

Even the designers of native applications (new components, etc,) are
not necessarily going to know what resources are consumed by system
services their programs may try to use.


George

Re: Forewarning of resource inadequacies
On 4/12/2016 12:22 AM, George Neuner wrote:

[offlist as much of this is implementation specific -- before
someone complains about length (yet can't restrain themselves
from reading it!  :> )]


Re: Forewarning of resource inadequacies
On 12.4.2016 ?. 14:22, Don Y wrote:
Quoted text here. Click to load it

Hi Don,
If someone complains why not just ignore him. I did not read the
messages entirely but I did read most of them and enjoyed the posts
of you both.

Dimiter


Re: Forewarning of resource inadequacies
Hi Dimiter,

On 4/12/2016 5:39 AM, Dimiter_Popoff wrote:
Quoted text here. Click to load it

George and I correspond.  There are details that I will explain to him that
I wouldn't post, publicly.  And, having a bit of a "history", I know which
things I need to "explain" and which he'll understand without further
exposition.

Likewise, I can relate ideas to his work (which I'd respectfully not
disclose without his prior public disclosure) or some of my prior.

He also has a healthy understanding of the lengths to which I take
some of my approaches/implementations having previously discussed
some of those with him.

Taking lengthy/detailed discussions offlist just makes it easier to
talk without having to consider everything one says/types and how third
parties are receiving it.

Second day of rain, here.  I may have to move to your neck of the
woods for WARMTH!  Or, is it STILL freezing?  (does it EVER warm up,
there?)  ;-)

Regards to L,
--don

Re: Forewarning of resource inadequacies
Op Thu, 07 Apr 2016 23:10:04 +0200 schreef Don Y
Quoted text here. Click to load it

I suggest rejecting all dogma. (I can suggest doctrines, however.)

Quoted text here. Click to load it

Depends who is initiating the task and who is responsible for resource  
availability.

1. In an open user-controlled environment, leave it to the user. This
means giving the user the tools to determine resource usage and the option
to shoot himself in the foot.
2. In more closed/rigid environments, make a best effort to inform the
user. If system damage is possible, prevent it.

Quoted text here. Click to load it

Or lock the file.

Quoted text here. Click to load it

Or reserve the space.

Quoted text here. Click to load it

What if you're recursively copying thousands of files from potentially
multiple partitions to potentially multiple partitions potentially
containing hardlinks and/or symlinks?

Quoted text here. Click to load it

You could, but what if the user's mom calls right when he is about free  
some space? It should be possible to pause the operation.

Quoted text here. Click to load it

As with everything in life. Your questions are as old as human society.

Quoted text here. Click to load it

If you already know that the user is likely to perform a backup, then the  
resources for that should already be reserved before "some operation" can  
be initiated. In this case, there is no jeopardy.

Quoted text here. Click to load it




--  
(Remove the obvious prefix to reply privately.)
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Re: Forewarning of resource inadequacies
On 4/13/2016 2:36 AM, Boudewijn Dijkstra wrote:

Quoted text here. Click to load it

The issue isn't to prevent the user from shooting himself in the foot;
rather, the problem is not SURPRISING him at some later time that an
action he (appeared to) successfully initiated has not completed as he
had expected.  Because something changed -- possibly the result of
a subsequent (or EARLIER, still running!) action on his part or that
of another.

We've all been "disappointed" to come back to some long-running,
computer-related activity at a later time -- only to discover that it has
abended, "unexpectedly".  And, often ANGRY if the reason for this could
have been known when we STARTED the activity!  (e.g., not enough space
on a device, file size exceeds maximum file size supported by targeted
filesystem, daily scheduled job at 00:00, etc.)

This suggests locking those resources at the start of the operation.
But, that can needlessly prevent other tasks from sneaking in, using
those resources and RELEASING them before they ACTUALLY are needed
by the first task.

E.g., don't lock up the printer for the output you'll be generating
an hour from now cuz other tasks might want to print in the meantime!

[OTOH, if you leave the printer "available", you risk it running out
of paper from use by one of those other intervening tasks]

Bottom line, I don't see any one-size-fits-all solution.  The fact that
tasks can take prolonged periods of time exacerbates the problem as it
allows the problem and its notification to be decoupled (in time) from
the task initiation -- the original user may no longer be "present"
for that notification!
     "The timer went off."
       "When??"
     "Oh, about 15 minutes ago."
       "Did you take the bread out of the oven?"
     "No, I didn't realize you were baking!"
       "Didn't you wonder why the timer was on and the oven was hot?"
     "Well, no."
       "Then why not tell me about it WHEN it happened?  Isn't a timer
       the sort of thing/event that has an immediacy associated with it?"
etc.

Quoted text here. Click to load it



Re: Forewarning of resource inadequacies
Op Wed, 13 Apr 2016 15:51:10 +0200 schreef Don Y  
Quoted text here. Click to load it

No, it suggests reserving them. For a complex operation, it suggests  
locking until the exact resource needs have been determined.


--  
(Remove the obvious prefix to reply privately.)
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Re: Forewarning of resource inadequacies
On 4/14/2016 5:43 AM, Boudewijn Dijkstra wrote:
Quoted text here. Click to load it

This ties up those resources for the length of the operation -- preventing
other similar (unprivileged) tasks from using those resources even if their
use may be brief and transitory -- not jeopardizing to the execution of this
first task.

I.e., that's the nature of the problem:
- if you want to be able to tell the user that his task *will* execute,
   then you have to impose the same sorts of reservations that you would
   for privileged tasks (i.e., at the expense of other tasks that the user may
   want to execute -- you can't risk "gambling" that things will work out
   to the satisfaction of all his tasks)
- if you want to maximize potential utility of resources (for an indeterminate
   set of possible user tasks), then you can't give assurances at task
   activation -- because you can't (don't want to) follow up by imposing
   those restrictions on the resource use

I.e., the potential for maximizing utilization comes with the inherent risk
of a potential (future) shortage; if you go that route, you must be willing
to inform the user of that possibility WHEN (if) it later occurs -- even if
that is impractical.

That's the "no free lunch" aspect.

So, even adding heuristics to implement "partial" gambling (to maximize
utilization) -- i.e., make those guarantees for long (whatever that
means) operations where the user may walk away, lose interest or forget
about the task's activation but NOT for short-lived operations where
you HOPE the user is still around for any potential notification.

I.e., the user doesn't have a consistent interface/relationship with the
system:  sometimes he KNOWS that a task will complete simply because it
was accepted and started; other times, he might be surprised to be
BELATEDLY (though not *too* late?) informed that a task that HAD started
won't be able to complete.

The user has to "learn" how to differentiate between these two types of
tasks.  Or, the system must be able to tell him (at or prior to
activation).

*Or*, the user must be given the OPTION of "that guarantee":
      "This is going to take a while to complete.  I can guarantee
      completion but only at the expense of other tasks that you may
      elect to activate (or, have previously activated 'conditionally').
      Would you like to take advantage of this capability?"

But, again, this makes some tasks different (in terms of what the user
experiences when activating them).  And, no way for the user to know
which tasks those might be (unless I consistently ask -- even for short-lived
tasks that will PROBABLY be able to complete unimpeded).

It's a sort of:
    "Are you sure you want to do that?"



Re: Forewarning of resource inadequacies
Op Thu, 14 Apr 2016 15:37:15 +0200 schreef Don Y  
Quoted text here. Click to load it

It depends how fine-grained these tasks can specify their resource usage.

Quoted text here. Click to load it

Exactly, you can't predict the future without knowing what will happen.

Quoted text here. Click to load it

I was thinking it would be perfectly fine to impose those restrictions for  
cheap resources like e.g. disk space. If reporting about resources  
inadequacies is important, then the task itself is important and deserves  
special treatment.

Quoted text here. Click to load it

I think there's one thing at play here that you haven't explicitly  
mentioned: confidence factor. More confidence is less surprise. During a  
task, the system could report its confidence of completion.

Quoted text here. Click to load it

So: start job at 100% confidence.

Quoted text here. Click to load it

If the tasks are predefined, a rough confidence factor could be assigned  
to them.

Quoted text here. Click to load it


--  
(Remove the obvious prefix to reply privately.)
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Re: Forewarning of resource inadequacies
On 4/15/2016 5:24 AM, Boudewijn Dijkstra wrote:
Quoted text here. Click to load it

They can only specify their needs over the run of the entire job.
I.e., can't say "I need this for X minutes, then that for the next
Y minutes, etc."

So, a job that might "think" for 10 minutes and then consume gobs
of disk in the last 30 seconds looks the same as one that eats
the same amount of disk at a steady pace.

For tasks that MUST run, you have to treat their requirements
in a block as you can't (usually) bias their scheduling wrt other
similar tasks to exploit any complementary overlaps.

For a "hard-wired" implementation, you can embed *your* knowledge of
the resource interplay of different "consumers" into the use of those
resources.  E.g., to emulate that sort of capability, I have to wrap
complementary tasks into a single "pseudo-task" and specify the
resource requirements of the MAX(task1, task2, task...) for that
pseudo-task; then, ensure that I only activate one of them at a time.

[There's no way for me to tell the system:  this task requires XYZ
but only if neither task A nor B are active -- otherwise it requires 0.]

Quoted text here. Click to load it

And, neither can the user!

But, the machine won't get emotional if something it wanted to do
COULDN'T complete.  A user, OTOH, can become annoyed if "surprised"
at some later time ("Why didn't you tell me there wasn't enough
disk space BEFORE you started working on it?")

Quoted text here. Click to load it

That's specious reasoning.  You're equating "important" and "convenient".

If I try to move a folder onto a thumb drive and, some minutes later, get
informed that there is no space left on the device, it's INCONVENIENT;
I now have to move everything back and find a bigger target drive.
I might be able to defer that activity for days or weeks (I may have to
purchase a larger thumb drive!) -- hardly "important".

Or, if I want it done immediately, I may have to rearrange what's on the
thumb drive (make space) *or* consider cutting the folder into two
logical pieces that "make sense" to move onto two different media.

OTOH, if its a laptop running on battery and that battery is nearly depleted,
it is "important" that everything be shut down in an orderly fashion if
I don't want to risk losing something.  More significant than "convenient"!

I can deal with "important" by simply reserving resources in the admittance
process.  I'd like NOT to have to treat everything that might become
"inconvenient" as "important"!

Quoted text here. Click to load it

You're assuming you have the user's attention for that entire period.

If you activate a "print" job, you typically don't sit there watching
pages come out of the printer, one at a time, until the job is
finished.  You may move on to some other task -- perhaps not even
involving The Computer.  Or, depart (go to bed, etc.).

If the printer runs out of supplies (paper/ink) at some point prior to
completion, you are disappointed/annoyed when you eventually go to pick
up your FINISHED job.

If the printer (and print service) doesn't let you resume an interrupted
(paper out) job, you have to restart the job -- from the point at which
it prematurely stopped.  If the printer *thinks* it did a great job
printing but you notice all sorts of visual artifacts on the pages
(smearing/smudging/dropouts/etc.) then there's a significant difference
of opinion as to whether or not the job was actually *done*!  :>

Quoted text here. Click to load it

And what do I do when the user walks away THINKING everything is fine?
Only to return some time later to see "confidence = -10; job aborted"?

The problem is coming to "some understanding" with the user regarding
the quality of service provided and expected.  And, "being fair" in
stating the realities involved.  I.e., to claim 80% confidence but
REPEATEDLY fail to complete the task can only be seen as disingenuous.
As the machine can't predict the future, it shouldn't lower its confidence
estimation:  "past performance is no indication of FUTURE performance".

But, the user (an emotional being) won't see it that way.

Quoted text here. Click to load it

Would you use anything on your PC if it announced that it had anything
less than 100% confidence that it would complete?

How do I assign a confidence factor to a user-written script?  Track
it's performance statistically and report THAT each time it is started?

Quoted text here. Click to load it

I think there are two "solutions":
- give hard and fast guarantees (then, use the mechanisms available to
   ensure these are met)
- give NO guarantees, but allow tasks to run indefinitely (no deadlines)
   so a task has never really "failed"/aborted!  (it just keeps waiting
   in the naive hope that the resource it needs WILL become available!)


Site Timeline