RPC/RMI in heterogeneous environments

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
I postpone automatic type conversion to the *servant*, rather
than embedding it in the client-side stub.  My reasoning is
that this allows a wider window for the targeted node to change
*after* the RPC/RMI invocation and before it's actual servicing.

I.e., it would be silly to convert arguments for a target node
of type X, marshall them and then, before actual transport,
discover that the targeted *service* now resides on node Y
(which is of a different "character" than X).  That would require
unmarshalling the already converted arguments (the originals having
been discarded after the first conversion), converting them from X
to Y representations, remarshalling and facing the same potential
"what if it moves, again?"

However, this defers actual *checking* the data until after
transport.  It also means transport errors take a higher priority
than data type errors (cuz the data aren't checked until they've
actually been transported!)

Discussing this with colleagues, it seems a smarter approach might
be to split the activities between client- and server- side stubs.
E.g., check the data in the client-side stub so any errors in it
can be reported immediately (without waiting or relying on transport
for that to occur).  Then, let the servant sort out the details of
conversion KNOWING that the data are "valid"/correct.

[The downside of this is the servant must bear the cost of the
conversion; so, targeting an action to a resource-starved node is
costlier than it might otherwise have been.]

The process obviously mirrors for return value(s) conversions.

[I haven't, yet, considered if anything undesirable "leaks" in either
of these approaches]

Re: RPC/RMI in heterogeneous environments
On Thu, 21 Jan 2021 18:38:27 -0700, Don Y

Quoted text here. Click to load it

Or it means choosing a particular 'server' before marshalling the
data.  Although nowadays heterogenous hardware is handled with
agnostic protocols, in the past it was not unusual to use a 3-way
handshake when dealing with varying hardware.  The 2 sides could
negotiate the protocol rather than it being fixed.

Quoted text here. Click to load it

Although admittedly unusual, I can see RPC allowing, e.g., 'integer'
while not requiring any particular width, or 'float' while accepting
integer values and converting.  It would allow the client to send
whatever value it has - as long as it makes sense in context - even if
the server-side function might be expecting something different.  

But it means either using a self describing binary protocol that
encodes the type and width of each value (and also endianess if
relevant) so the receiver knows what to do with it.  Or it means
falling back on a text-based protocol.  Either way it means parsing on
both sides.

Quoted text here. Click to load it

In a heterogenous system, the burden of "converting" - whatever that
entails - usually is placed on the more powerful processor.  Of
course, ordinarily the 'server' would be the more powerful processor -
but in your system that's not necessarily the case.

With self-describing data, each receiver can figure out what to do
with what it receives.  But that's going to put additional burden on
low power processors ...

... unless you use dedicated comm proxies.  Instead of talking
directly, communication with a service hosted on a low power system is
indirected through a proxy that understands and can convert between
the generic RPC format and the native data format of the service's

Quoted text here. Click to load it

What kind of "leaks" are you concerned with?


Re: RPC/RMI in heterogeneous environments
Hi George,

You guys keeping safe?  I've spent the day documenting the
vaccination process for my neighbors.  Sure would have been
nice if that information was *published*!  (moronic IT
people making process decisions...)

*Snow* (dusting) expected tomorrow.  Yay!  Maybe a break
from the allergies!!  (though sad news for the citrus <shrug>)

On 1/24/2021 11:58 PM, George Neuner wrote:
Quoted text here. Click to load it

The client has no control over the server; wherever the object is
"backed" is where the RMI was be destined.  It would be like
having the ability to pick which server handles your bank

And, as that backing can change WHILE the stub is executing (or,
while a packet is waiting to gain access to the wire), the earlier
you make a decision about a binding, the greater the chance of
that decision needing to be "rethought" before the packet
actually *hits* the wire.

[I can handle these conditions but they mean another hop
in the transport as a TEMPORARY zombie proxy is created on
the previous hosting node to catch "late arrivals" and forward
them to the new node.  This just moves the goal-posts a bit.
But, that zombie is intentionally short-lived just to cover
the race condition]

The goal in all this is to minimize the performance hit for the
"local" case; if everything an app needs resides on the local
node, then it shouldn't be encumbered by all this extra mechanism;
just the cost of a streamlined /I/PC!

Quoted text here. Click to load it

I insist that a conversion never lose information.  So, float to
int is out.  And, int to float can also be out (depending on the
supported int values and the range of the float).

But, things like Q5.3 to float can work.  Or, handling a 17 bit enum
in a 32b word.  Or, converting a bool to a single bit.  Or, an
ASCIIZ string to a counted string or fixed size char array.  Or,
a char to a wchar.  UTF16 to UTF8.  "binary" to BCD, etc.

Quoted text here. Click to load it

I have a dictionary of data types.  Instead of just picking from
a (relative) few, I let each API decide what is appropriate.
The thinking being that you wouldn't FORCE an application
talking to "itself" to adopt a particular subset of all available
types; so, why force parts of a DECOMPOSED version of that application
to adopt a similar set of constraints?

In my example, a servant only needs to know how to convert
all compatible incoming formats *to* the formats used by the methods
it has exported.  The workload manager won't migrate an object/process
to a node for which converters are unavailable.

E.g., Big Endian SPARC floating point likely won't have a
converter to Little Endian Intel Big Rational.  So, any ABI that
relies on BESPARC will likely be constrained to reside on a SPARC!

Unfortunately, none of the language bindings are sufficiently specific
to automate these type descriptions.  int_t, bcd_t, Q5_3_t, etc.
may all map to the same fundamental type and, thus not gain any
protections from the language.

Quoted text here. Click to load it

...as well as overtaxed "high power" processors.  If, OTOH, a datum
is just tagged as "format #123" and is already known to be a valid
representation of a #123 datum on the originating client, then
the servant can just blindly invoke the (#123,clientflavor) to
(#123,serverflavor) routine.  The "parsing" is implicit.

Rather than tag each datum IN TRANSIT, I let the message type
reference a dictionary of "payload descriptors".  Those, then,
enumerate the data types and their placement in the payload.

Quoted text here. Click to load it

Yes.  The problem, there, is being able to extricate the proxy
when it is no longer needed.  E.g., for Client invoking on Server, you
can interpose a Proxy (possibly residing on Agent).  But, as the
Object backed by the Server may migrate to another (more powerful,
less heavily loaded, etc.) New node, the Proxy must either stay in
the loop and act as a proxy for that migrated objects, UNNECESSARILY;
or, you have to remove the endpoints for that migrated Object
back to the New node (replacing the existing Server endpoint that
had been accessed by the Proxy).

In the degenerate case, you can move all of the objects from
Server to that New node and eliminate the need for the original
Proxy, entirely.  Then, have to eliminate the zombie proxy!

[You can also fashion an "upcall" from the "dumb server" to some
smart agent residing on a better endowed node.  But, this starts
to strain the programming model.]

As the system is far from static, I envision a fair bit of object
migration (which may include server migration) as more processes come

E.g., I've noticed (in MY coding style) that a process quickly
loses its need for a NameSpace as I tend to resolve any names
early on in the process activation and bind them to more concrete
variabes.  Having no need for the NameSpace, I can "free" it and
thereby lower my resource burden (which makes me look like a
friendlier application to the resource scheduler).

For folks who don't adopt a similar strategy (or, who can't),
a NameSpace object may remain actively bound to their process
for considerably longer.  But, if not regularly accessed,
the NameSpace server on that node may opt to move the NameSpace
object to *another* NameSpace server; the object is still *active*
(and accessible), but the server backing it has decided to
shed those resource requirements to better serve the NameSpaces
that are seeing active use!

Or, the NameSpace server may be remoted by the workload scheduler
to trade off resources between nodes.

An infrequently accessed NameSpace could be backed on a low
performance node with relative ease -- you just need enough
DATA space to store the NameSpace object; access will be
slower but still "correct".

[Of course, *that* node could decide to migrate the NameSpace
object to a BETTER processor if it sees that it is spending
an inordinate amount of resources handling its requests!]

Quoted text here. Click to load it

A process shouldn't be able to determine much about how/where
it is executing *or* its affect on the rest of the system.
The system decides when/where/if you should execute.  And,
can change those decisions without forewarning.  So, anything
that you think you've sussed out NOW can be invalid moments
later.  Or, you may cease to exist those moments later!

Granted, a process could (would!) know the processor family on
which it is executing (by compiling some machine-specific
datum into it's image -- which might even be the command
line used to invoke the compiler).  If it did this frequently,
it *might* be able to deduce that it has been migrated
("Hmmm... I was running on Intel and now I appear to
be running on ARM") -- *if* the migration is to a different
processor family (or build environment).

But, it shouldn't be able to deduce its relative priority,
the demand it may/maynot be placing on its hosting node,
whether an object invocation is actually serviced locally
or remotely, infer the load factor of the node backing that
object, etc.

E.g., passing bad data could leak the "localness" of the targeted
node because it would incur a transport delay before being
"validated" (if the checking was done in the servant).  It could
also leak the types of processes currently active on the
targeted node as its request vs those seen on the "local" node.

Due to the openness of the system, I can't count on processes
being non-malevolent.  Nor can I count on the hardware resources
being adequate for the tasks that they are hosting at any given
time.  I don't want a hostile actor to be able to infer anything
that could be (easily) exploited -- before the system can deduce
that its a hostile actor and kill it (and blackball it so it never
gets loaded, again)!

I'm going to try the client/server split that was suggested in
the original post and throw rocks at it to see how well it fares
in a (artificially) constantly changing configuration.  Then,
try moving the conversions to client and/or servant and see if there
is a measurable difference.

Also, think harder on what each approach might leak and how that might
be exploited.

Re: RPC/RMI in heterogeneous environments
Hi Don,

On Tue, 26 Jan 2021 00:06:44 -0700, Don Y

Quoted text here. Click to load it

Arrgh!  They're saying here perhaps by May.  According to the news and
the various national vaccination tracking sites, this state ranks near
the bottom for percentage of existing doses administered (~40%).

Right now, they're giving out only Pfizer's 2-dose vaccine, and there
is major controversy over whether to give out more 1st doses vs
completing the series for full immunization.  And nobody knows how
long the immunity will last anyway ... this may turn into a yearly
thing like the flu.

They are strictly limiting who even can get 1st doses: right now its
limited to hospital workers, first responders and cowardly politicians
[somehow almost the entire state legislature managed to get itself
vaccinated].  They can't seem to decide what to do about 2nd doses and
so far have left the decision to the individual hospitals that have
vaccine to give.

They've made no provisions for people to register and be notified when
they are eligible - they're counting on doctors to notify their own
patients !?!  When was the last time your doctor called to remind you
about a flu shot?

I suppose I shouldn't be surprised: even in the best of times, its a
rare day that the "establishment" can find its arse with both hands.
These are hardly the "best of times".


Quoted text here. Click to load it

We've been at or below freezing for several days and we're expecting
1-2 inches of snow tonight (Tue -> Wed).  We're way behind on snowfall
this season.

Quoted text here. Click to load it

Understood.  My point was about how replicated services  ... ie. the
same service provided by multiple servers ... are handled.  The client
doesn't know which server will respond to its call.

In a modern system, the RPC protocol would be self describing and
architecture agnostic - so it doesn't matter which server answers.
However, in older systems it was not uncommon for clients to negotiate
the protocol with whatever server responded, and for data in the
resulting messages to be sent in the native format of one side or the
other (or both, depending).  Whether the client or the server (or
both) did some translation was one of the points that was negotiated.

Quoted text here. Click to load it

Which is not the same as the problem of replicated services, but it
has to be handled in a similar fashion.

Quoted text here. Click to load it

Not necessarily: a protocol that uses multicast in both directions can
be completely location agnostic.  But since multicast precludes
directly addressing the machines, the protocol must include its own
"internal" addressing scheme to identify conversations.

It's common for replicated services to be on multicast, but only for
initial contact by the client ... when a server responds, ordinarily
it communicates directly with the client.

Quoted text here. Click to load it

Which goes back to identifying the server's host before trying to

Quoted text here. Click to load it

Well ... RPC generally needs to be language independent.  If my
language has 2-3-4 tree (or, as below, "big rational") as a native
type, that does not mean the RPC protocol can or should support it.

Quoted text here. Click to load it

Unless you design a protocol that distinguishes them.

Quoted text here. Click to load it

To a point.  Absolutely, you can streamline by encoding structures and
other collection types rather than individual data items.  But you
still have to ensure the data makes sense: e.g., that floats are not
NaN or IND, that arrays have all their elements initialized, that
references to nodes in serialized graphs/trees are valid, etc.

Quoted text here. Click to load it


If the scheduling model supports it, you could make a server needing a
proxy into a "peer" group that must be started/stopped together.  

Quoted text here. Click to load it

If there's no way to ask ...  

ISTM that much of the user programming was being done in a managed
language.  If so, it should be (relatively) straightfoward to prevent
tasks from learning about their environment.

Quoted text here. Click to load it

Not if there *always* is a random "transport" delay.

Quoted text here. Click to load it

Only if different errors are possible and are directly due to the
difference in host.

Quoted text here. Click to load it


Re: RPC/RMI in heterogeneous environments
On Tue, 26 Jan 2021 13:39:30 -0700, Don Y

Quoted text here. Click to load it

You're not seeing that the case where a service may change hosts is
quite similar to the case where the service can be provided by
multiple hosts.  In either case, the client does not know which host
will answer until it tries to communicate.

Quoted text here. Click to load it

Perhaps it is time to look into such a run-time solution.

Quoted text here. Click to load it

But what is the likelihood of that?  You have said that the server can
only be started on a CPU that has sufficient resources ... so apart
from a failure WHY would it be started somewhere and then quickly
moved somewhere else?

If you are anticipating that clients routinely will spend so much time
in their local RPC code that a useful service has time to be rehosted
... well then you've got bigger problems than servers moving.

Quoted text here. Click to load it

I *wrote* such an application: the spec required that all network
communications be human readable.  In my case, however, there was a
large amount of processing being done at all nodes, so although the
messaging was complicated by printing and parsing, altogether it
consumed only a small percentage of CPU time.


Re: RPC/RMI in heterogeneous environments
On 1/28/2021 7:08 PM, George Neuner wrote:
Quoted text here. Click to load it

It's not applicable in my world.  The client NEVER knows which host
(which may be its own!) *will* service or *did* service a particular request
because the client never knows where the object resides, at any point in

The *kernel* does the actual transaction as it knows whether your handle
ACTUALLY references an object of that "type" (a handle is just an int),
which permissions you have for the object's methods, where the object is
"physically" located, etc.  (if those tests were done in the client's
address space, the client could bypass them)

"You" don't see the RMI as anything other than an expensive function
invocation.  You aren't involved in deciding how -- or where -- the
action is performed; all you get to see is the result(s) of that
invocation (or a data error or transport failure).

Quoted text here. Click to load it

That's done at compile time (sorting out how to do the conversions)
and at run time by the workload manager knowing which nodes are
running which servers, on which types of hardware/processors, with
what resource complements, etc.  There's no "negotiation" involved;
the workload manager simply conditions its selection of the "best"
node on which to place a server based on the data types that will
be required (if sited, there) and the conversion routines available
for its clients.

Quoted text here. Click to load it

It may have been started 4 hours ago; that doesn't change the fact that
there is a (possibly large) window between when the RMI is invoked and
when the message actually hits the wire, destined for some PARTICULAR
node (which may now be incorrect, given the data type conversions that
were applied to the parameters WHEN INVOKED).

The *tough* failures are "someone unplugged node #37" (or powered it down,
unintentionally).  Suddenly EVERYTHING that was running on that node is *gone*.
And, every reference INTO that node (as well as out of) is now meaningless,
on every node in the system!

Because a node can easily host hundreds/thousands of objects, there is a
flurry of activity (within a node and between nodes) when a node goes down
so unceremoniously.

[It's actually entertaining to watch things move around, new nodes get powered
up, etc.]

So, the system sees a spike in utilization as it tries to recover.
That increases the likelihood of deadlines not being met.  Which, in
turn, brings about more exception handling.  And, services deciding
to simply quit (HRT) -- which propagates to clients who expected those
services to be "up", etc.

[The challenge is ensuring this doesn't become chaotic]

As this isn't "normal" operation, the workload manager can't know what
sort of NEW load factors will exist as they ALL try to recover from their
missed deadline(s), broken RMIs, dropped endpoints, etc.  It can only react
after the fact.  *Then*, start reshuffling objects to eliminate peaks.

If someone unplugged a node "improperly" (the most likely catastrophic
"failure") without telling the system that they WANTED to do so, then it
is also likely that they may soon unplug ANOTHER node (e.g., "Ooops!
Wrong cord!  Maybe it's THIS one?")

[I've put a lot of effort into making it easier for Joe Average User
to Do-The-Right-Thing (*if* it needs to be done, at all) when it comes
to physical system configuration.  But, he'd quickly be intimidated if
a simple error on his part ("Wrong cord") resulted in significant
consequences (e.g., system reboot).]

[[In other applications, users may be prevented from tinkering with
the physical system by policy ("touch this and you're fired!") or
physical barriers (locked equipment closet).  I don't have that luxury
in *my* application.  And, no IT department to attend to these acts.]]

There are also other "normal use cases" that result in a lot of
dynamic reconfiguration.  E.g., if power fails, the system has to
start shedding load (powering down nodes that it deems nonessential).
Doing that means those resources have to be shuffled to other nodes
that will remain "up" -- at least until the next load shedding
threshold (to prolong battery life).  And, competing with users who
may have their own ideas as to how to shift the load to address the
power outage.

And, of course, the normal acquiring and shedding of applications on
the whims of users varies the instantaneous load on the system... which
precipitates load balancing activities.

Bottom line, things are ALWAYS jostling around.

Quoted text here. Click to load it

I expect a *lot* of service invocations as EVERYTHING is an object
(thus backed by a service).  Applications just wire together services.
The system sorts out how best to address the *current* demands
being placed on it.

In ages past, you'd just keep throwing a bigger and bigger processor at
the problem as more "responsibilities" got added.  And, in periods of
less-than-peak demand, you'd bear the cost of an overprovisioned system.

Eventually, you'd realize you couldn't continue scaling in this manner
and would opt for additional processor(s).  Then, you'd start sorting
out how to partition the (existing!) system.  And, how to rebuild
applications to handle the introduction of new communication protocols
between components on different processors.

[One of my colleagues is working on an app that sprawls over close
to a square mile!]

Quoted text here. Click to load it

Different sort of application.  I've designed several "english"
language interfaces that were unnecessarily verbose to make
them more portable and understandable.  The "unnecessary text"
and encodings acted to enhance confidence in the integrity of
the comms protocol.

But, even manually enforcing all of the "superfluous" text is trivial
in those and represents window dressing instead of an encumbrance to
the passing of the actual data.

     "Sample in cell C2 has identifier 177D6BB.\n"

Imagine, instead:

     sum = fmul("1003.45", "-3348234.0002")

The Chinese Meal!  :>

Re: RPC/RMI in heterogeneous environments
On Sun, 31 Jan 2021 11:44:22 -0700, Don Y

Quoted text here. Click to load it

Terminology notwithstanding, *something* knows which host it is
talking to:  technically it may be a proxy/agent rather than an "end
point" (client) per se, but [sans a completely generic encoding]
something has to know in order to proper encode the data.

Quoted text here. Click to load it

You're seeing lots of trees but you are missing the forest.  

It DOES NOT MATTER who/what makes the actual call - the point is that
the "other end" is not known until the call is made.  The situation is
EXTREMELY SIMILAR (if not identical) to that of a conventional network
client invoking a replicated remote service.  And it is amenable to
many of the same solutions.  The cases of the service being co-hosted
with the client or be hosted on a compatible CPU (thus needing no data
translation) are simply variations that may be streamlined.

Quoted text here. Click to load it

Again terminology. "Negotiation" does not imply lots of back and forth
hashing out nitty gritty details ... in network parlance it means
simply that there is an exchange of meta-data which allows the systems
to communicate more effectively.

At simplest the meta-data could be a single word indicating what kind
of CPU is at the other end.  

At the most complex it could be a vector of codes representing what
data translations could be needed and what each side is capable of
doing.  Messages then are encoded using the "lowest" common format(s)
[for some definition of "lowest"] and each side translates only what
it can and/or must.

In either case, it is a single handshake consisting of one "this is
what I am / what I can do" message in each direction.  It is typical
for such host specific meta-data to be cached so that it is exchanged
only if the remote host has not been seen previously or for some
period of time.

Quoted text here. Click to load it

If you anticipate that clients routinely may spend unbounded time
inside local RMI code without any messages being exchanged, then the
system is fundamentally unstable: it is subject to livelock (as well
as deadlock), and the global state can become perpetually inconsistent
- which is a form of Byzantine failure.

Quoted text here. Click to load it

Yes ... but in the face of Byzantine failures there is only so much
that can be done automagically.  

FWIW: when the system knows it is in a "recovery" mode, it may make
sense to elide handling (or even throwing) of certain exceptions.

Quoted text here. Click to load it

Understood ... the problem is that in order to maintain stability,
some central authority has to direct the changes - the typical
networking techniques of relying on timeouts and retries will not work
well (if it works at all) when the environment is very highly dynamic.

There are ways to evaluate global status in a dynamic system ... not
continuously, but in snapshot ... and the goal then is maintain that
every snapshot shows the system to be consistent.

Quoted text here. Click to load it


Re: RPC/RMI in heterogeneous environments
On 2/1/2021 1:50 PM, George Neuner wrote:
Quoted text here. Click to load it

No.  The kernel ***KNOWS*** where the object is located BEFORE THE RMI
IS EVEN INVOKED!  There's no need to inquire as to its current location.
Nor is there a mechanism for doing so as there is no global repository
of object information; if you don't have a "hand" (handle) on an object
you don't know that it exists (let alone WHERE it exists)!

So, in the absence of a last minute notification of the targeted object
having moved "(very-very) recently", the kernel can be assured that it
knows where the object is, presently -- exactly where it was "a while ago"!

By shrinking the time between when the kernel NEEDS to know where the
object resides and the time the kernel can put the message on the wire
(remembering that the wire may not be READY for the message at any given
time), I minimize the window in which an "object-has-moved" notification
can render the location information obsolete.

[You can't eliminate this possibility when you have TRUE parallelism.
A uniprocessor can give the illusion of eliminating it by ensuring that
only one sequence of instructions executes at a time -- locking out the
move (or, alternatively, the reference) until the move is complete.]

As this is done in the kernel, the kernel can (almost literally) fill
in the destination address AS it is putting the message on the wire.

If I have to convert parameters BEFORE transport, then I have to look at
that location information *earlier* -- in order to figure out WHICH
conversions are appropriate for the targeted node.

The fact that the kernel is doing this means it gets done "right";
instead of relying on individual clients to keep track of every
"object-has-moved" notification and do-the-right-thing with that info.

[In an early "process migration" implementation, I exposed all of these
events to the clients/objects -- thinking that there might be some
value that they could add, in certain special cases.  I never found a way
to exploit those notifications (cuz they still don't tell you anything
about the old or new location!)  So, they just added more work for each
process; more stuff that the process developer could screw up!]

It also means the kernel can track the location of the object for ALL
(local) clients that presently have references into it -- instead of
having to notify multiple clients of a single object's movement, even
if those clients would "do nothing" with the information.

Finally, it means ALL clients having references to a particular
object can be notified "in real time" when anything happens to an object.

Quoted text here. Click to load it

There is no point in time where this process occurs.  A node's kernel
KNOWS where all of the objects that it references are located.  It
knows (and can find out nothing about) any OTHER objects; if you don't
have a "hand" on the object, it doesn't exist (to you).

The kernel's notion of "where" is simply updated when the object moves.

This is just an efficiency hack; if the kernel didn't know of the move,
then it would deliver the message to the object's previous location -- and
the server that previously handled requests for that particular object
would forward it along to the new location -- it knows this (and has a
"hand" on that new instance -- because *it* was tasked with forwarding
the internal server state that represents that object).

Without the notification (location update), the kernel can't FIND the

If the object is NEVER "used", the kernel *still* knows where it is!
Its only when the last reference (on that node) to the object is dropped
that the kernel "forgets" -- about the location AND THE OBJECT!

I.e., when the server that was previously handling requests for that
object "goes away", any old handles to that object's existence, there,
become unresolvable; if you haven't been informed of the new object
instance's location, it's lost to you -- forever!

Quoted text here. Click to load it

Again, there is no need to "ask" or "tell".  The kernel already *knows*.

And, knows that there *are* necessary conversion operators in place
on <whatever> target node to ensure a message will be "translated"
correctly -- otherwise, the object wouldn't have been allowed to move
to that node!  And, the target node already knows which conversions
to apply to each incoming method invocation FOR that object (and others
of its type)

A node can't magically change its hardware.  So, whatever the characteristics
of the node when it came on-line persist to this point in time.  Why *ask*
it if you already KNOW the answer?

Quoted text here. Click to load it

That's always the case when you can't bound the problem space.

The user can always opt to install applications that the hardware can't
support (when considering the range of OTHER applications that could
potentially be CO-executing, at any given time).  The user can also
opt to *reduce* the available hardware.


[A PC user can install a suite of applications on his PC that could NEVER,
practically, co-execute (due to resource limitations).  But, the PC doesn't
explicitly *say* "No, you can't do that!".  Instead, it tries its best to
support everything in the framework that was imposed when it was designed.
It may lead to excessive thrashing, long response times, dropped events,
etc.  If that's not satisfactory, buy a faster PC (or change the application
load).  That's *your* problem, not the PC's!  Or, any of the application

This is very different than *most* embedded systems where the (functionality
of the) code that executes after release-to-manufacturing is very similar
to the code that exists the day the product is retired.

Think about ANY project that you've done.  Imagine the end user coming
along and installing some other bit of software on the hardware that's
hosting your application.  Or, deciding to remove some hardware resources
(downgrade CPU, RAM, disk, etc.)  Can you make any guarantees as to whether
your original application will continue to run?  Or, that the new
complement of applications will peacefully coexist?

Chances are, if confronted with this situation, you (or the product
vendor) will simply disclaim that situation as the user has modified
the product from its original sales condition; warranty void!

All *I* can do is try to minimize the scenarios that lead to this sort of
behavior and hope that specific instances are "sufficiently damped" so
they don't take the whole system down (but may lead to transient failures).

And, make the OS services aware of the sorts of things that *can* happen.
So, when the system starts to go chaotic, those heuristics can inherently
prune the actions that would swamp a less inspired solution.  E.g., if
the process that should be scheduled next simply can't meet its deadline,
then don't even bother giving it a chance -- schedule its deadline handler,
instead (and eliminate that load from the processor).

Just like I can't prevent the user from unplugging something unilaterally
without giving me a head's up.  I can crash, unceremoniously, and blame
him for "not following the rules".  But, its unlikely that he's going to
see it as HIS fault ("What a stupid system!")

[I was fiddling with the GPUs in one of my workstations, yesterday.
As it is hard to "service" them, I apparently didn't seat one of the
GPUs properly.  The 1100W power supply didn't burn traces on the
motherboard.  The GPU didn't get toasted.  Instead, I got an error
code that essentially said "The power supply (which I know to have been
operating properly a few minutes earlier) has a problem."

What did I do (likely *incorrectly*), recently?

Remove card.  Reseat -- CAREFULLY.  Error gone!

Obviously, PC and GPU vendors design with the EXPECTATION that these
sorts of things WILL happen and add provisions to their designs to
avoid costly consequences.

Had I "lost" any piece of kit DUE TO MY ERROR, I would NOT have been
happy with the GPU/PC vendor(s)!]

Quoted text here. Click to load it

Correct.  You just try to minimize the impact "expected" (which is not
the same as "desired" or "planned") events have on the system.  E.g., if
the transport media ran at 9600 baud, you'd expect lots more problems
related to transport delays.  If the network had considerably more hosts
you'd expect media access delays.  Etc.

Shrinking the window of vulnerability (to object movement) is just another
attempt to tamp down the worst case consequences.

E.g., you can buy metastable-hardened components, reduce event and/or
clock frequencies, add multiple layers of synchronizers, etc. but that
just minimizes the *probability* of a metastable condition eating your
lunch; it doesn't (completely) eliminate it.

Quoted text here. Click to load it

How do you decide what should be notified and what shouldn't?
Do you let application developers (each with their own self-interests)
declare what's important?  Do you implement a clearing house to
review applications and impose its own notion of significance on

I opt for providing as much information as possible to clients/servers
(cuz a server can actually be an *agent*).  And, hoping folks who are
diligent developers will take the effort to sort out suitable
recovery strategies (that only THEY can imagine).

[This was the reasoning behind exposing all of the "object motion"
events to the clients, originally]

If you don't want to handle the exception (or, don't have a robust strategy),
then the default handler is invoked -- which just kills off your process!
Don't want to check the result of malloc()?  Fine!  SIGSEGV works!  :>

Of course, if current events indicate that the system can't perform some
desired action, then it *can't*!

Quoted text here. Click to load it

The responsibility is distributed.  Individual nodes can decide they need
to shed some load (and WHICH load to shed).  Likewise, they can advertise
what loads (strictly in terms of the resources they are willing to dole
out to an imported load) they are willing to take on.  There's never *an*
optimal solution; you just try to "make things a little better" given your
current knowledge of the current situation.

And, then try to make THAT *new* situation "a little better".  I.e., expect
object assignments to be in a constant state of flux.  Damp the response
so the system doesn't oscillate, shuffling objects back and forth with
no actual benefit to the system's state.

E.g., many objects (processes) just idle "most of the time".  A thoughtful
developer doesn't claim all of the resources he will EVENTUALLY need -- when
he is just idling.  You *can* do so -- but, when the system decides to
shed load, it is likely to see YOU as a nice, fat opportunity to bring that
resource under control!

Rather, when/if they need to "do work", they can consume a boatload of
resources -- if they happen to be available (SOMEWHERE in the system).  I can
learn when some of these things are *likely* to happen, if they are driven by
the users.  E.g., you might watch TV/movies in the evening so I can expect to
need to have at least two network speakers and one network display "on-line",
plus the associated CODECs, at that time.

But, others are reactions to events that are "random"/unpredictable.  E.g.,
I can't predict when the phone might ring or someone might come to the
front door (though both are unlikely in the wee-hours of the morning -- so,
that might be a good time to schedule the speech recognizer training based
on the speech samples harvested from today's phone conversations!)

So, I can't *anticipate* a good binding of objects to nodes.  Instead, I have
to be able to react to changes as they occur.  And, try not to break things
AS they are being juggled!

[Of course, I can *notice* configurations/object-node bindings that seem to
work better than others and *tend* towards promoting those configurations;
its silly NOT to notice good and bad situations and use them in influencing
future object distribution choices.  E.g., I start shuffling objects
around when I detect the first ring of the phone -- *before* those objects
are called on to actually "work" at answering the phone!]

Having the ability to move objects (processes) and bring other nodes
on/off-line is my way of coping with peak loads without resorting to
unilateral overprovisioning (or, arbitrarily limiting the functionality
that can be used at any given time).  Inherent in this is the fact that
there will be transient periods where things are in flux.

[To avoid this, you could design individual DEDICATED appliances that
have all of the resources that they need -- even if they happen to be
idle (much of the time).  The classic over-provisioning approach.  Of
course, that hardware then perpetually limits how the device can evolve!
(how often does your smart TV get an update?)

Or, impose arbitrary limits on what a system (monolithic or distributed)
can be asked to do at any given time.  (under-utilizing?)

Or, hope some "big server" is always available (in a timely manner) to
provide whatever resources you need.  The "it's-someone-else's-problem"

Or, let the user decide that things aren't performing AS HE EXPECTS and
take manual actions to fix the situation (which is how folks deal with their
PC's when they start thrashing, responding slowly, etc.)]

Quoted text here. Click to load it

Re: RPC/RMI in heterogeneous environments
On Thu, 21 Jan 2021 18:38:27 -0700, Don Y

Quoted text here. Click to load it

You must be quite desperate (extreme data rate, flea power client,
very small memory client)  for even considering functionality that
naturally belongs to the client.  

Any healthy clients can handle byte order, time zones, character sets
and languages.

Re: RPC/RMI in heterogeneous environments
On 1/29/2021 11:33 AM, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

It's "power" (resource) related only in that time = 1/"power"

Quoted text here. Click to load it

Convert Q10.6 to double

Re: RPC/RMI in heterogeneous environments
On Sun, 31 Jan 2021 12:37:27 -0700, Don Y

Quoted text here. Click to load it

Computing power (and hence power consumption) is often exchangeable
with time. Thus the _energy_ consumption of a complex operation can be
nearly constant. This can be an issue in truly flee power systems, in
which you may have to find algorithms with less complexity to reduce
total energy consumption,

Quoted text here. Click to load it

Q10.6 is already in binary representation fitting into a 16 bit word
or two bytes.  Just some shifting to do the hidden bit normalization
and some exponent adjustment and you are done. Should be easy with
even any 8 bit processor. Doing it with some primitive 4 bitters (e.g.
(4004/4040) would require more swapping between internal registers.

Of course a decimal number with 10 integer digits and 6 fractional
digits (either in BCD/ASCII/EBCDIC) requires much more work and
scratchpad areas.

Re: RPC/RMI in heterogeneous environments
On 1/31/2021 11:50 PM, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

By "power", I meant all resources, not just watts.  A device can consume
very little power (watts) but also be crippled or capable in what it can
do in any given amount of time.  E.g., I can perform fast multiplies without
a hardware multiply -- if I can trade memory for time.

[It's also possible to be a power HOG -- and STILL not be able to do much!]

Quoted text here. Click to load it

It's not a question of how complex the operation is vs the capabilities
of the processor; an MC14500 can EVENTUALLY pull it off!  But, increasing
complexity (with a given set of "capabilities") adds to "conversion delay".

The issue is putting this conversion task in series with the routing of
the packet.

If you convert parameters on the client-side, then the time spent
converting them increases the time until the packet gets onto the
wire and, ultimately, to the target node.  (Note that you had to make
assumptions as to WHICH node would be targeted in order to know
which conversions would need to be made.)  But, while you are busy
converting, a concurrent process (possibly on another node) can
cause the targeted object to be relocated to a new node.

So, the work done preparing the parameters for the original node is now
wasted; a NEW set of conversions need to be applied to fit the requirements
of the NEW target node (because you've decided that these conversions need
to be done at the *client*!).  I.e., even though the wire is ready for your
packet, your packet is no longer ready for the wire!

By contrast, if you ship the parameters "as is" and rely on the far end
to perform any necessary conversions, then the time between the
client invoking the RMI and the time the packet is ready to be placed
on the wire is shortened.  This reduces the likelihood of an "object
moved" event rendering your conversions futile (though it still means
your packet has to be routed to a different location than from the
one initially assumed).

Quoted text here. Click to load it

Imagine shipping a 4K page of audio samples (because that is far more
efficient than shipping individual samples!) in little endian form to
a server that wants them in big endian form.

Or, a TIFF to a server that wants a JPEG.

Or, an array of 1000 floats to a server that wants IEEE 754 doubles.

I.e., "power" is a relative term.

(Of course, there are practical limits to the sorts of conversions you
can expected to do in this process.  But, you should be able to design
a client or a service with reasonable implementation flexibility...
and "fixup" for equally reasonable expectations from their counterparts)

Re: RPC/RMI in heterogeneous environments
On Thu, 21 Jan 2021 18:38:27 -0700, Don Y

Quoted text here. Click to load it

Dud you look at the MQTT protocol https://en.wikipedia.org/wiki/MQTT
which is often used with the IoT drivel. The MQTT Broker handles the
marshaling between lightweight clients.

Re: RPC/RMI in heterogeneous environments
On 1/31/2021 12:39 PM, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

Different paradigm entirely.

I'm trying to implement a "connection" between a specific instance
of a specific client's invocation of a specific method on a specific
object.  I.e., the mechanism between:
    result = object.method(params)
    type method(params) {... return retval}
In a single processor environment (in a single process container), this
mechanism would be the compiler's building of the stack frame and
"CALL" to the targeted method (referencing the object specified).

I don't want to know about any other methods or objects.  And, no one
else should know about this object or its methods UNLESS THEY HAVE

As the (malevolent) developer can avoid the protections that the compiler
provides (e.g., pass bogus data or a bogus object reference or method
selector), that mechanism has to protect the codebase against such attacks.
Do something you shouldn't and you die -- you've either got a latent bug
(in which case, it's dubious as to whether the next line of code that you
execute will be correct) *or* have malicious intent (in which case, I
don't want to give you another chance to muck with things).

MQTT is more of an "active whiteboard" where agents exchange information.

I can create an app (object) that is responsible for turning on a light
when the ambient light level falls below a particular level.  The
developer coding the app can KNOW that there may be hundreds of
possible lights.  And, many different ways of determining "ambient"
(is it an outdoor determination?  indoor?  windowless basement?  etc.).

He may also know that there is an electric garage door opener that
could be commanded to open and allow (intruders?  accomplices??)
to gain access to the property.

But, he can't use any of that information.  He can only access
the sensor that is provided to him and can only control the
light that he is given control over.

If he is the only application that is concerned with that light,
then he's the only app that even knows that the light exists!
And, the light is "safe" from other malicious/buggy acts.

Marshalling arguments is easy.  Getting them in the right form
(data type/encoding) and at the right *time* is the issue being addressed
by "late conversion" (to coincide with the late reification)

Site Timeline