Unintentionally Decoupled "process" interactions [long]

D

Don Y 10 years ago

[Apologies to folks who end up seeing this in multiple places: USENET, forums, mailing lists and personal email. Similarly, apologies to folks only seeing it in *one* such place and, possibly, missing out on comments introduced in other venues. And, apologies if the technology may be unfamiliar to some. I'm in a rush to get this sorted out so I can get back to "real" work... including yet another /p.b./ project! :< ]
[For those of you directly involved in this, you'll recognize much of the text as lifted from our past emails, names elided. I include it here for others who won't be aware of the design approach. Also, I plan on having the Gerbers and BoM's ready by Monday and hope to get a good start on the specifications by then, as well. Expect them to appear a week later. I hit a snag in another project so this one suffered, a bit. :< It would be great if someone could offer to draft a user manual from them! (hint, hint) (HINT, HINT!!!) I'd really like to see this installed by September as I have some other commitments and guests coming online at that time. And, this would be REALLY cool to showcase!]

------------

I have a quick-and-dirty little project that I probably should have begged off -- but, it so closely resembles work that I'm doing on another project that I opted to take it on. /Carte Blanche/ is an excellent motivator! :>

As with other projects in recent past, it's a distributed control system. Unlike other projects, (almost) all of the "application" software resides in a central "controller": a COTS SFF PC running a baremetal MTOS with no physical I/O's beyond disk (for persistent store) and NIC (for RPC's).

The "remote" nodes are relatively simple (dirt cheap) "motes" that do little more than hardware and software "signal conditioning" for the physical I/O's that they happen to have available, locally. This saves *thousands* of dollars of cablng costs wire and labor (which often has to be done by a union electrician, "inspected", etc.).

A (portion of a) local namespace might appear to be:

/devices /audio /microphone /speaker /display /screen /backlight /touchpanel /switches /1 /2 /3 /lamps /1 /2 /3 ...

All of these are exported to the central controller FROM EVERY NODE. I.e., as if each was a "network share", "NFS export", etc. (in some cases, some or all may be exported to other nodes as well! E.g., node1 might act as the user interface for a "headless" node5)

The central controller builds namespaces from this composite set of exports. E.g., one such might be:

/devices /audio /node1 /microphone /speaker /node2 /microphone /speaker ... /display /node1 /screen /backlight /node2 /screen /backlight ...

while another, equally valid and *concurrently* active might be:

/devices /node1 /audio /microphone /speaker /display /screen /backlight /node2 /audio /microphone /speaker /display /screen /backlight ...

Still one more might be:

/devices /audio /talk ("microphone" from node 1) /listen ("speaker" from node 5) /display /screen (from node 3) /backlight (from node 3) /touchpanel (from node 3) /switches /left-limit ("3" from node 8) /home ("3" from node 2) /right-limit ("1" from node 7) /lamps /red ("1" from node 5) /green ("2" from node 4) /blue ("1" from node 3) ...

while another task has a different SET of the SAME NAMES -- but bound to different "node instances". I.e., the same identical piece of code running in that "another task's" namespace would result in "identical" activities -- but taking place on an entirely different set of node I/O's using these different namespace bindings.

[This is really slicker than snot! Tasks need not be concerned with where their I/O's are located nor worry about interfering with the activities of other tasks: if a resource isn't bound in *your* namespace, there is NOTHING you can do to interfere with it nor any concern that *it* might interfere with *your* activities! The protection domains are absolute -- much like operating in a chroot(8) jail!]

With this, I can build little "virtual machines" (poor choice of term as it has alrady been appropriated for other DIFFERENT use) out of components from anywhere in The System. E.g., I could bind "/switches/3" on node *1* to "/button"; and "/lamps/1" on node *5* to "/bulb" and then use

if (/button == ON) /bulb = ON

to implement a "remote" light switch. No hassle of sending messages across the network to specific IP addresses, implementing an IDL, marshalling RPC arguments, etc.

At the same time, I can have another task that works in a richer namespace -- e.g., the third one, above (introduced as "while another") -- that can do:

for (N in 1..number_of_nodes) for (lamp in 1..3) /devices/nodeN/lamps/lamp = OFF

to implement a "master reset" (after ensuring that no other task tries to turn any of these ON after having done so)

[This is important as it shows how the multiple displays, etc. can be handled from a central controller without ever burdening any individual task with knowledge of more than a *single* display, etc.]

The takeaways/executive summary: - resources appear in namespaces - anything that doesn't appear in a namespace can't be referenced (i.e., *accessed*!) - any new INDEPENDANT namespace can be constructed by binding names from an existing namespace to NEW names in the new namespace - a resource may appear in multiple namespaces concurrently and with different names - resources can exist on different nodes without that being reflected in the construction of ANY of the namespaces in which they are enumerated - having a name (handle) for a resource allows it to be accessed without regard for physical location (!)

There's a small amount of "work" involved implementing this to create the "device interfaces" (i.e., the software that makes a particular digital output look like "/lamp/1" and control its state via "ON" vs. "OFF" -- or "DIM", "FLASH", etc.). Much of the application effort involves deciding how to split the namespace into task-specific namespaces (i.e., limiting what any task can do to ONLY the things that it SHOULD be able to manipulate and query; a task that expects to send and receive characters via a serial port doesn't necessarily need to be able to alter the baud rate or other characteristics of the PHYSICAL interface!).

For example, if a task shouldn't be able to access a resource, then it's namespace shouldn't include any references BOUND to that resource! If a task shouldn't be able to perform a particular operation on a particular resource (e.g., never turn it "OFF"), then an agency/proxy can be created to process requests from the task and dispatch APPROVED requests to the actual resource; the "bare" resource never appearing in the constrained task's namespace but the proxy appearing, instead!

The biggest problem lies in the use of a central controller in the implementation. While it makes the nominal application much easier to implement (more resources, richer development environment, etc.), it complicates the sharing of physical resources on the individual nodes!

I.e., each *node* can access its own *local* "namespace" (and, if I so choose, even portions of namespaces on other nodes -- in much the same way that the central controller does!). This allows some failsafes to be locally implemented (e.g., "turn off ALL lamps if they've been left on for more than 12 hours" or "if unable to contact node X for more than 23 minutes, flash /lamp/2 at a rate of 1 Hz")

THE PROBLEM

The real issue lies with the user interface devices -- display (screen,touchpanel), audio (in/out), etc. (others not mentioned in my above descriptions). They are shared resources that may

*not* want to be "uniquely held" by a particular agency/task.

E.g., while the central controller (aka "main application") typically "holds" all of the user interfaces (as *it* is interacting with the various users), an asynchronous "event"/exception may be raised in a local node that needs to be conveyed to the "application" (usually, with some urgency). In a uniprocessor/tighly coupled SMP implementation, the communication is local and reliable. The application layer eventually recognizes the event and decides how the information should be conveyed to the user and/or the event handled.

The application is AWARE of the event! (i.e., if it needs to do something beyond the notification, it *can* do those things)

In the distributed case, that may not be true (e.g., the "event" may be "Communication Failure"). So, it may not be possible for the local node to post the event to the central controller and the central controller to turn around and "announce" the event to the user *at* the local node (because the controller holds the display resource for that node).

The local node may, thus, need to borrow/override some portion of a user interface to convey that information to (and elicit an acknowledgement from) the user WITHOUT *depending* on the central controller to perform that interaction.

The local node has no knowledge of the semantic content of *its* display -- the controller has been painting it based on the needs of the application as interpreted and implemented by the controller. So, the local node has no idea as to which portions of the display might be precious AT THE CURRENT MOMENT. Yet, it has to use *some* portion of the display to communicate this to the user!

[Note "display" can be a visual or aural indication; same issues apply]

I liken this to early Windows (printer) drivers that would throw up a modal dialog to complain about the printer being out of paper/toner. The "driver" unilaterally decided to grab a portion of the display device WITHOUT REGARD for what the user was doing at the time. (i.e., there's no way the driver could KNOW what the user was doing or the importance of individual pieces of display real estate!) And, it would do so in the most "noticeable" place on the display surface -- right where the user was typically looking at the time!

Then, to add insult to injury, took the focus away from the user was doing to DEMAND acknowledgement from the user -- as if a paper shortage was the most important issue the user was facing, at that time. (from the driver's perspective, it probably *was*!)

[Note that this has been "fixed" to present those notifications elsewhere -- in a specific location of the display (Tray) and only AFTER the user expresses an interest in knowing the details of the "alert"! Fine -- if you've got the display real-estate to spare!]

This also presents opportunities for undesired behaviors to creep into the UX -- the user may be in the process of "doing something" based on the screen's contents an OhNoSecond prior to the dialog appearing (e.g., pressing a key, clicking on something, etc.) and can't stop himself quickly enough for the action to be processed by the interrupting, focus stealing dialog. He may never even *see* the dialog if his eyes are elsewhere (e.g., engaged in some activity scripted from written notes placed on the worksurface).

There are two aspects of this that are consequential to the distributed nature of the implementation:

- redirecting the interpretation of the user's actions to the proper "context" (local focus vs. normal, remote focus)

- informing the "original" overlaid application that the focus has now shifted (i.e., if the application was expecting an acknowledgement in a particular time period, that may not be forthcoming because the request/indicaation for that action it is not APPARENT to the user) -- there may not be an operable communication path between the two parties!

As it's a control system, neither the local nor remote "activities" can really "pause". If a motor is in motion, it must stop before it causes damage. If a user must interact with a mechanism, that must be done before the mechanism advances to another stage in its operation. Etc. Just because the UI is "interrupted" doesn't mean the process will be!

In the past, for "richer" implementations, I've reserved a portion of the user interface for "express messages" -- to ensure that they can always be seen WITHOUT COMPETING with the normal display content provided by the application. In my HA project, I use a spatialized "display" to present specific indicators/annunciators in different parts of the "space" around the user's head. So, a "chime off to the left" can indicate one thing while a "buzz" off to the right can indicate something else. I rely on the character and location of the sound to reinforce the user's "remembrance" of the event despite his being actively engaged ("focused") on some other activity.

Here, I don't have the physical resources to implement such a static "reservation".

One possible approach is to just overlay and wait for some sort of acknowledgement. The argument being that if the user isn't WATCHING the display (to acknowledge this asynchronous notification), then he's not MISSING anything in the overlaid application, either! And, if he's dilly-dallying in addressing that event, then he "deserves" the consequences of the ignored/overlaid message!

Another approach is to alter the display in some unique way (invert it, flash it, etc.) to draw attention to the fact that a notification is pending. Then, await the user's explicit acknowledgement to present (and eventually dismiss) that. I.e., RESERVE the "blink attribute" for this indication.

Still another is to use an alternate user interface channel to convey this alert (e.g., something on the audio to indicate the presence of pending video; something in the video to indicate the presence of pending audio!)

An even more Draconian approach might be to add a "special indicator" ("check engine light", buzzer, etc.) that serves this purpose (but, that adds to recurring cost).

[Note that I've not mentioned these sorts of interactions/notifications BETWEEN nodes. E.g., when nodeX is acting as the UI for nodeY!]

Preferences? Anything I've not considered?

Vote

W

werner 10 years ago

-----snipped because the news server found the quote too long

Maybe not quite what you wanted as feedback but as I read your post I thought hmm Hypercard aah Hypercard yes Hypercard

Hypercard is dead, of course and I found the clones unconvincing when I looked at them some years ago. But conceptually there might be something useful.

Regards Werner Dahn

Vote

C

Clifford Heath 10 years ago

Why should the control be centralised? I can think of many situations where you need different things controlled from different places.

In many cases, more than a single display is desirable. For example, most A/V systems have their own display, but can also be controlled from a phone. (note that the phone sends commands, it's not actually a central controller).

Good idea. Not a new idea.

Good ideas. Not new ideas.

So drop it. Allow devices to export their control interfaces (in a discoverable way, see below) and allow other devices (plural) to send commands to those.

No. That's easily solved by use of a "dialog manager", which virtualises the human-computer communication needs, and adapts them to the available display hardware. Again, these are old ideas, at least as old as the 1980's. Apollo even had a product called "Dialog Manager".

That's because the device driver was operating at a level below the dialog manager (in this case, the Windows UI). As you say, the problem was solved by hooking it up differently, making the Windows UI available to the driver.

These are all just "human factors" design questions. They're complicated by the need to manage parallel processes - and to avoid switching the user's train of thought needlessly - but they must be tackled by modeling the communication on the user's mental processes, not on the hardware or the physical implementation. That's what a dialog manager must do.

In my opinion the interesting problem here is how nodes and controllers can discover the capabilities present in the network. Mere enumeration (like USB device enumeration) is not enough - that just shows what devices exist, not what purpose they serve or even how they are connected. Discovery by category (as implied by your namespaces) is not enough. It needs to be richer than this. DNS-SD is an example of a design that tries to solve this problem; it allows sending a query like "where is the closest A3 color printer to me?".

I'd love to see an A/V system with this kind of auto-discovery. When I hit "Play" I want the system to know which room I'm in, and to turn on the right amplifier and/or screen and route the audio to the right devices. I don't want to juggle half a dozen remote controls just to play a sound. I don't want to figure out which devices need to be turned on, to set up the right input channel selectors, or find the right volume control to adjust. I want to be able to do this from *any* controller that comes to hand, including my phone.

There needs to be an industry-standard protocol for this stuff.

Vote

C

Clifford Heath 10 years ago

I hate it when someone posts a long tome, and you spend a long time responding, and the OP never returns to the thread or engages with your response.

Vote

G

George Neuner 10 years ago

Don mentioned in the "task loop" thread that he'd be busy for several days. Be patient ... I'm sure he'll return to this.

George

Vote

D

Dave Nadler 10 years ago

Hi Don - You might want to have a look at TIB/Rendezvous. For several decades now this provides a namespace-driven many-node publish-subscribe bus used to implement lots of distributed applications (quotation and trading systems, semiconductor fabs, etc, etc, etc). Facilities included address your questions I think...

Hope that helps, Best Regards, Dave

Vote

D

Don Y 9 years ago

I'll admit to only giving a cursory examination of a "TIBCO Rendezvous Concepts" document so its possible I'm missing A LOT! :-/

But, it seems like all it really does is virtualize connections. I.e., frees up "clients" having to know where "services" are located. That;s not an issue in this application -- things are pretty static.

It also seems to operate whiteboard style -- everything (can) see everything (else).

In particular, I don't see how I can hide "Master.Power.Switch" so that only certain "tasks" can *see* its state -- and even fewer can *control* its current state. I.e., if I don't want a particular task to be able to turn power on/off, I (presently) simply don't put that "name" in the namespace that I give to that task (the namespace acts as a filter -- only names that exist in a task's namespace can be referenced IN ANY WAY. So, a task can't "synthesize" names to see if they *might* be legitimate.)

Beyond that, I don't see how I can bind "Power.Switch" for task A to one particular piece of "hardware" while binding it to a completely different piece of "hardware" for task B (and, to NOTHING for task C).

This system seems to expect everything to "cooperate" in a single space. (?) It doesn't help me partition/isolate that space into smaller units appropriate for the individual "tasks" in it.

Vote

Unintentionally Decoupled "process" interactions [long]

Join the Discussion

Didn't find your answer?