Hi,
I'm revisiting my "metrology server" (yet again).
This is essentially a "device wrapper" that lets multiple clients share a "measurement device" by independantly declaring their "requirements" to the server and the server implementing up-calls to the actual device driver to configure the device to mutually satisfy these sets of requirements.
[I.e., if client 1 wants a measurement range of (5,8) and client 2 wants a measurement range of (20,40), then the server tries to configure the device to provide a range of (5,40) -- or better -- by examining the capabilities of the actual device. Note that each such configuration change may have consequences for other parameters (e.g., absolute error may increase as the operating range increases) which imposes additional criteria on these changes (i.e., the server must continue to satisfy all existing configuration contracts)]The service is exported from each individual device over the network. It is connection-oriented so each "session" carries the specific "contractual requirements" established (interactively) for that session. The number of sessions supported for each device is determined by the resources available to that particular server instance. Like most network protocols, it is ASCII based (makes debugging easy via telnet).
The service exports verbs that let clients examine the range of values available for each "parameter" (setting) so they can intelligently choose settings befitting their needs. [this is important :> ]
This has worked well -- so far. Each server converges quickly on a configuration that addresses the particular needs of each of its clients.
*But*, this behavior secretly relies on a certain amount of "disorder" in the (distributed) system. I.e., if clients come on-line slowly/randomly, then things fall into place reasonably well.OTOH, if every client comes up simultaneously (this happened after a power failure), then many clients are negotiating their contracts simultaneously. So, client 3 may issue a query about a parameter (so that it can determine what setting it should *request*) just before client 7 *sets* a (possibly different) parameter. This setting affects the range of potential values for other parameters -- including the parameter that client 3 was recently interested in! So, when client 3 tries to *set* that parameter (to a value that it *thinks* is valid given the results of its recent query), the command fails.
A poorly written client would gag at this point ("Gee, I was just told that 27 is a valid setting -- so why is the server refusing to set the parameter to 27?"). But, even a well written client ends up having to reissue the query to determine the *new* range of values, pick one of those and then *try* to set the parameter to this *new* value -- which can potentially suffer from the same sort of race.
Anyway... this problem is obviously caused because there is no support for an atomic "test and set" sort of operation. I.e., if the "query" and subsequent "set" could be treated as an indivisible set of operations, then the possibility of the query's result changing after it's issue but before the set is issued is eliminated.
So, the protocol wants new verbs added like "LOCK" and "UNLOCK". Then, a potential conversation might be: LOCK QUERY parameter1 QUERY parameter2 SET parameter2 value2 QUERY parameter5 SET parameter1 UNLOCK
The problem here is that a client can get greedy and LOCK the server indefinitely -- so that it never has to worry about having some one of its SET fail!
Or, a crashed client could result in the server being locked indefinitely:
- client6 LOCKS the server
- then crashes
- possibly reboots
- starts a new session
- discovers the server is locked (by its precrashed self!) (deadlock)
[granted, the server might be able to recognize that the client has come back on-line -- but, this is complicated by the potential for multiple clients on a particular IP address]Another approach might be to have the lock timeout after some interval. This sucks because it is still not immune to a client LOCKing, timing out and quickly reLOCKing, etc. (OK, so the server has to ignore LOCK requests from the most recent LOCK-er for some time after it timesout...)
Yet another approach is to deliberately inject entropy into the system and hope for the best... :-/
Ideally, I would like to find a server-side policy that would be more "authoritarian" than "hoping" for clients to be well behaved. I stewed on this during my evening walk, hoping for "inspiration" and just ended up with *perspiration* :<
I can't think of any existing protocols that I could "borrow" to achieve this.
Is there a trick I am missing?
I am not resistant to changing the protocol but any new revision has to be able to support the same sorts of parameters, interaction, etc.
Thx,
--don