Resource revocation

Don Y · 2013-07-25T19:23:46+00:00

Hi, What's the current "best practices" regarding asynchronous notifications (in a multithreaded environment)? I have a system wherein "tasks" (omit a formal definition) request resources from a service that meters out their use; waiting until the resource has been granted to them "officially" (in some cases, this is all trust based). When done, they surrender the resource to the service where it can be reused by other consumers. But, there are times when the service must revoke a granted use of a particular resource. In some cases, it "asks" for the resource back (giving the current consumer time to tidy up before releasing it). In other cases, it just *seizes* the resource -- and notifies the consumer after-the-fact. Presently, I use signals to notify the consumer when this sort of thing is happening. But, my personal experience is such that folks have problems writing this sort of code. *Remembering* that they have to register a handler for the signal; remembering that said handler can be invoked at any time (including immediately after it has been registered); etc. Is there a new "safer" way of implementing these types of notifications? Thx, --don

T

Tom Gardner 12 years ago

That's true, but missing the point. POTS lines have availability guarantees backed by *statutes*. Cellular systems don't.

That guarantee does, of course, depend on the specific implementation of malloc. Some mallocs have suboptimum behaviour, particularly when they can't locate a suitable free block.

There have recently been adverts for people to supply and maintain PDP-11 systems for nuclear power plants. Until *2050*!

Who'd-a-thought PDP11s would still be a valid career option for youngsters.

Only for well-specified languages, which specifically excludes C/C++ and derivatives. One Big Hint to the scarcely concealed complexity is the number of command line options for the compiler and linker. Scary. Another Big Hint is that C/C++ has only just decided that a memory model is necessary - because synchronisation is specifically defined to be outside the definition of the language (i.e. library specific), so the libraries have to rely on the specific compiler/linker implementation.

Oh, maybe I should note that I don't care if the program executes much faster if it doesn't give the correct answer because the compiler/linker options were inappropriate.

Vote

T

Tom Gardner 12 years ago

Bugs can be removed. The byzantine generals (and similar) problem provably cannot be removed.

C is a mess even on uniprocessor machines. Multiprocessor shared memory systems with caches make me shudder when correctness is important. C++ is even worse.

I've seen many systems which were "well documented" but the documentation omitted to discuss key attributes, probably because the architects didn't realise there were underlying pitfalls!

There's no improvement with proprietary systems. Every study fails to show a consistent advantage to either proprietary or FOSS products.

Old engineering maxim: "you can't test quality into a product". Hopefully you can design it in.

There are many examples of good and bad proprietary and FOSS systems.

No, it was a disaster.

Even the designers didn't know what they had created. Classic case is that they were amazed when somebody produced a valid C++ program that caused the compiler to emit the sequence of prime numbers *during compilation*.

Agreed.

Have a look at Nick MacLaren's "Objects Diatribe", and weep. MacLaren has been on the sharp end of errant implementations and the standardisation process for decades. He knows where skeletons are buried.

I don't know limbo, but if it is based on C then I would need to be convinced that it has avoided C's problems - and my remaining life is too short for me to bother to look!

Vote

T

Tom Gardner 12 years ago

UK POTS telephone exchanges had a requirement for a specified number of minutes downtime in its 40 year lifetime. Not quite the same thing, but close.

One should always distinguish "high availability" from "high reliability". Telco systems are typically HA without necessarily being HR.

Vote

T

Tom Gardner 12 years ago

That sort of thing.

Agreed, unfortunately.

I was deliberately avoiding defining what I meant by "proof" :)

Vote

P

Paul Rubin 12 years ago

That sounds pretty outlandish. I'd like to know what kind of hardware that is.

I've never heard of a serious high-reliability system (COTS or otherwise) that doesn't use redundancy. Joe Armstrong (Erlang inventor) likes to say that a non-redundant system can't be called reliable, since the power cord is a single point of failure.

Vote

P

Paul Rubin 12 years ago

You mean in something like the MISRA C guidelines? "Correctness proofs" and C don't really go together. I expect it's more a matter of bug avoidance, and easier auditability.

Vote

P

Paul Rubin 12 years ago

Not remotely the same as keeping the same piece of hardware running for that long. Phone switches have methods of dealing with failing hardware, and of upgrading the software while the switch is running. Erlang has hot-upgrade features built into the language runtime. In other systems, you load and boot new code on the backup processor, then shut off hte main processor, causing a graceful failover. That's a slow moving reboot and in principle you could handle something like memory fragmentation that way if you noticed a degradation (of course you'd fix the problem causing the fragmentation, you wouldn't just keep rebooting).

FWIW, I worked on one of those systems some years back, and the code and coding processes were basically crap. They kept the product reliable by

1) very thorough testing, and 2) relying (I think without realizing it) on the fact that in a program of that sort, most of the code paths are never exercised, so there were probably 1000's of bugs in the system that nobody ever happened to trigger.

Vote

T

Tom Gardner 12 years ago

What software is that? The systems I was referring to were pre-computer Strowager exchanges based largely on PO type 3000 relays :)

Made for a bloody noisy environment during busy hour!

I'd have liked to use Erlang, but could never justify it.

I doubt it. You can't test quality into a product.

I expect there was a solid architecture, specification and design, and they tested that the implementation conformed to those.

Vote

P

Paul Rubin 12 years ago

I don't know if I'd use the term "quality" to describe the code what came out the other end, but it did work pretty solidly once delivered to customers. I used to wonder why it didn't collapse. I think the main thing protecting it was that it didn't have to deal with unrestricted user input like a desktop application, or do much complex processing like a compiler does. Each unit was configured and tested before delivery and I guess not reconfigured once in the field. So it was in some ways like an appliance: just a few combinations of input were possible once it was in the customer's hands.

Wishful thinking. It was designed by electrical engineers who were smarter than hell but didn't know anything about software. So they were able to write crazy and intricate code and actually get it to work, but it was near-unmaintainable (premature optimization was a recurring theme). It had tons of written specs per ISO 9002 but the specs didn't have much to do with the internals of the code.

Meanwhile I hear that SpaceX is using C++ to program what will eventually be manned spacecraft. Shudder.

Vote

D

Don Y 12 years ago

[much elided]

Yup. Unfortunately, trying to merge documentation into the source "product" just doesn't work. Forget LP and the various nods to this effort (doxygen, etc.)

I've taken a more fundamental approach. I accompany my code with "tutorials", of a sort. I.e., papers that present key issues in a more conversational manner replete with illustrations, etc. (currently trying to put interactive demos into them as well!).

So, I don't have to explain *why* am am doing something in the code. Just state *what* a particular piece of code is doing and leave it to the reader to figure out "why this will work" by reading the supporting tutorials.

A document that I always find "missing" is a roadmap (no, not a description of the file hierarchy!): something that says how the code moves from RESET through RUNTIME.

Finally, showing strong structure *in* the code so folks can mimic this to cover all (most of) the bases when implementing something similar (e.g., design of a multithreaded service).

Of course, people will *still* ignore all the above and complain that "its a piece of crap". But, they'll only find agreement among other "sloths". :>

I don't claim one camp is better than another. Rather, I am commenting on what I see missing in SO MANY FOSS projects. (i.e., people spend time on new features because they are "more fun"... getting the *old* features to work properly is boring! As will be getting the NEW features to work completely NEXT WEEK!)

You need a formal design and then a formal test plan to verify that the implementation meets the design.

"Wow, Bill! What a great looking boat you built!" "Um, I started out to build a doghouse..."

Pgsql has been one of the few bright spots, for me. It actually *feels* like someone is driving the process instead of letting it wander off into featureland.

You don't think C++ brought the idea of OOP into the mainstream? Previously, folks were all writing procedural based implementations and you had OOP left to languages like Smalltalk to propose.

[I'm not claiming it was a GOOD way for folks to embrace the paradigm. Rather, that it brought it to the attention of a generation of "programmers". E.g., after C++, I started being far more consistent in how I structured my code (regardless of implementation language) moving further from the "A then B then C" approach to one where operations and attributes were tied to "objects" (regardless of how I implemented those objects)]

What I *liked* about C++ was how nicely I could combine different types of numeric objects with infix notation (and let the compiler sort out which casts to apply).

It gets tedious having to implement, e.g., a Rational data type and do all operations as:

ratC = add(ratA, ratB); ratG = mul(ratC, ratD); ratF = exp(ratG, ratE); double foo = realize(ratF);

E.g., I currently use a Q10.13 format in this project and "expressions" using objects of this type don't lend themselves freely to infix notation (other than addition of like types).

Historically, I'd been naive thinking standards were logically reasoned. As I get older, I find it harder NOT to see the "politics" (and economics) involved in many of these processes.

I'll search for a MacLaren reference...

It tries to clamp down on a lot of the freedom C affords developers. With attempts to keep you from running with scissors. (e.g., pointers are gone -- much to my dismay!)

As a general purpose language for general purpose problems, I wouldn't recommend it. OTOH, as a scripting language it seems to be expressive enough to address the sorts of things that I want to be able to code (in this system).

Unfortunately, much of the implementation is devoid of commentary. Its as if the developers feared their fingertips would fall off after some fixed number of keystrokes and tried to conserve them for "important stuff" :-(

And, of course, the documentation hasn't been updated since the initial public offering.

I suspect the principles are busy trying to keep bread on their tables...

--don

Vote

D

Don Y 12 years ago

Hi Tom,

Exactly. So, when a provider says, "Hey, rather than fix all these downed lines (because we didn't think ahead to bury the utilities below grade), we'll give each affected homeowner a little box that they can plug in (and pay for its power!) that gives them the same old RJ11 outlets into which they can plug their phones to connect to the PSTN" there is a fair bit of deception going on.

Sure, this can save a boatload of money/labor costs. But, it also means those "land lines" are now no longer land lines and don't have to follow the same *rules* as real land lines!

*And* the nature of the application! That's the whole point: if you understand the implementation and your memory usage patterns, you can use these mechanisms with a much greater level of confidence.

I recently crafted a "parameterized" dynamic memory manager. (trying not to use the name "malloc" since malloc has hysterical significance and connotations). E.g., you a heap (arena) that you want to operate on, the size of the request, the criteria that the allocator should use to select the appropriate fragment from the free list, what it should *do* with that fragment after it has selected it (i.e., return it en toto or cut it to fit the request... and, *where* to cut it: save the head or save the tail!), and, how any "leftover" is reintroduced to the free list.

[A similar set of arguments apply to "free"]

So, by careful choice of parameters, I can dice a heap into a bunch of fixed size "buffers" (aka "memory pool"/partition) and then have the allocator subsequently pick among those buffers to suit my future requests.

Or, issue one set of requests/releases to the head of the arena and another set to the *tail* so the requests never interfere with each other (effectively creating two heaps that share the space of one: so, if one is underused, the other can "use more")

It was tedious to code and test (since it is chock full of invariants to ensure memory is used correctly at runtime) but *seems* to have been a worthwhile exercise. At the very least, it has let me easily explore the efficiencies of different implementation options in a real runtime!

If you know how memory is being used, this doesn't have to be a problem. I've designed many applications that aren't intended to be reset. I've never heard of one crashing because of a heap problem.

See my comment, elsewhere, re: 6502's. At least the 11 was a respectable machine! :-/

I find NOT having access to the carry to be a huge impediment in squeezing the last few percent out of many algorithms. But, I work hard NOT resorting to dropping into ASM *just* to write a tiny fragment that *can* exploit the carry.

OTOH, if I have written a portion of some subsystem (e.g., scheduler/dispatcher) in ASM, I take full advantage of that opportunity! :>

Exactly. Compiler/linker switches should just affect size of executable (in time and space). "Whaddya mean 0xFF is now '255.' and not the '-1' that I had intended?"

How much of the vagary in these language specs is a consequence of politics? How much a consequence of accommodating variations in hardware (recall, one's compliment and sign/magnitude machines were in use when C had its origins...)

Kinda unfair to judge based on what has become the norm (in hardware) nowadays.

--don

Vote

D

Don Y 12 years ago

From:

If New York and New Jersey refuse to give permanent permission for the switch from landline to wireless phone service, Verizon could be forced to rebuild the phone network on Fire Island and in Mantoloking. Unlike cable and wireless companies, landline phone companies have regulatory obligations in most states to supply lines at a reasonable cost to anyone who wants one. They also need federal approval to end service.

Note that this conveniently fails to mention the QoS issues! :>

Vote

T

Tom Gardner 12 years ago

In which case I expect that the system architecture and specification would be good, since h/w engineers automatically avoid many traps[*] that softies seem to actively throw themselves into.

The implementation of the specification in s/w might well be a different kettle of (smelly) fish! Or ugly C code might have been auto-generated from a clean higher level specification (e.g. an FSM) that is very similar to the way the system is specified. That technique is common in the telco and network world.

[*] especially those related to distributed interacting systems with long lifetimes

ISO9000 is all to do with process, not quality.

Don't uy any property downrange of the launch site :)

Vote

P

Paul Rubin 12 years ago

I don't have any reason to doubt the hardware design was good, though it's not my area and I didn't see into it very much. The initial software architecture may also have made sense at some level. After a decade or so of patches and upgrades by dozens of programmers, any clarity had long since vanished.

It wasn't anything like that. It was handwritten and too clever by half. They "optimized" all over the place without profiling anything to know where the cycles were actually going. I remember one module had to check the status of a few hundred channels in some sorted order. It had a single C function around 20 pages long, full of clever coding tricks and code duplication to avoid unnecessary work. Of course it amounted to a micro-optimized O(N**2) sorting algorithm smeared through all that code (this particular module may have actually been chewing cycles, but most weren't). Replacing it with a reasonable algorithm made the code

1/4 of the size and orders of magnitude faster, optimizations not needed. Ericsson's codebase was apparently in sort of similar condition when they switched from C to Erlang in the 1980's(?), so I guess this is a familiar story.

This wasn't really a distributed system. It had two CPU's (primary and failover) with a communication channel and some shared peripherals. That aspect of the architecture wasn't too bad, but the upgrade/failover code was somewhat horrendous since the stuff that had to be communicated between the CPU's wasn't really localized in the codebase. You had to just hope that if you missed anything, it would show up in QA instead of causing some subtle failure in the field.

Vote

U

upsidedown 12 years ago

I very much doubt that youngsters would be interested.

My guess that the youngest would be born in 1980 and done some assembler work (on any platform) in the late 1990's, so they would be

70 years old in 2050.

Trying to extend the NPP lifetime to 60-70 years require practically a rolling update of everything (except for the pressure vessel) during that period, typically a mid-life update.

NPP control systems are typically updated after 20-30 years, except for instance Fukushima, where the control room looked so 1960/70´s:-).

I was not referring to speed, but rather features needed to write an operating system for a strange processor. If the processor has a lot of kernel mode only instructions, say for setting up memory mapping or cache control registers, how are you going to control it with some high level language ?

Things are easy with clean orthogonal architectures with all hardware registers memory mapped, like the PDP-11, in which everything can be controlled with assignment statements. Still even PDP-11 had special instructions like Move from/to previous I/D space.

Vote

U

upsidedown 12 years ago

Look at older industrial electronics, which is quite rugged and designed with huge margins and operated at low temperatures, so no wonder that it lasts long.

There seems to be a quite large second hand market from controller boards from the 1970's and 80's, when some companies upgrade their system after 20-30 years and sell their old systems and their spare parts to other companies that have elected to keep their system for some 5-10 more years before upgrading.

One reason that some companies doing early updates were the RoHS directive, since manufacturing of some old style boards was terminated.

I guess that there may still some non-redundant PDP-11's running some pretty big plants.

Vote

P

Paul Rubin 12 years ago

The PDP-11 isn't especially hard to learn or program, so I think the issue of maintaining old ones has more to do with keeping the hardware alive, finding replacement parts, etc. Think of weird Unibus peripherals etc. And of course using anything but original parts would require a messy certification process for the replacements.

Vote

R

Richard Damon 12 years ago

I use a different definition of Hard/Soft Real time. Hard Real Time is the domain of timing specifications that missing is considered a FAILURE of the system, it hasn't met specifications, and the "customer" gets to blame us for the problem. Of course if the input to the system aren't within specifications, we can push the blame back to the input.

A Soft Real Time specification allows for the missing of some deadlines, but there is some overall level of performance that must be met, which is hurt by missing the deadline.

There may well be a LOT of value to still completing the task after the deadline, but at that point you are in the domain of "damage control" to minimize level of failure. (Maybe you only maim someone instead of kill them).

Looking at the example of the rocket burn, a late burn likely is considered a failure, but a late burn may let you make another correction to get back on course. You likely have now used more fuel than planned, so something in the mission will need to be changed. This is a lot less damaging than saying that we missed the burn so we might as well just scrap the whole missing and let the craft hurl into space.

Vote

D

Don Y 12 years ago

[attributions elided]

Note that the system can still meet its objectives, late (see below). It's just that it would be "worth more" if they had been met "on time". (i.e., perhaps COST LESS to implement)

If there is *any* value after the deadline, then it is not HRT. HRT is defined as *no* value after the deadline (including possibly "negative value")

Note that there might not be any damage done -- even after the deadline! It may simply reflect the fact that it will COST more to complete the task after the deadline.

E.g., if you are picking items off a moving conveyor, picking it

*at* a certain point is preferable -- because the picking arm idles at that location!

But, if you are late getting the item, you might still be able to reposition the arm (via a servo) to grab the item at some point

*past* the ideal location. It obviously costs more to do so. (energy to move the mechanism, increased probability that you might not be able to get the *next* item correctly, etc.) But, until the item moves to a point beyond which the actuator can seize it, there is still value (decreasing) to your attempts to grab it!

While I understand the point you are trying to make by this example, I try to avoid them because it tends to conflate "safety" and timeliness. People end up thinking "if someone is going to die" then it's HRT!

E.g., smoking cessation has value regardless (?) of when it happens. But, obviously MORE value "early" than "late" (where "early" and "late" are subjective terms)

The two examples, here, differ in that one is HRT and the other SRT. If missing the deadline means there is no value to continuing to "work on the problem", then its HRT. The only result is hurtling into space.

OTOH, if there is some value to continuing to work on it (because orbital insertion despite the loss of some *other* mission feature is better -- more valuable -- than scrapping the mission entirely), then it is SRT.

Note that almost always there is some *hard* deadline beyond which a solution to an SRT problem becomes moot.

Vote

R

Richard Damon 12 years ago

This is a difference in an event view and a systems view. In your view, a system that is 90% successful at picking off the boxes is worth 90% of full value.

In my view, if the system had a true Hard Real Time spec here, anything less than 100% success is a system failure, and unless the system was speced to allow less than 100% reliability, it is a FAILURE, and the customer is in there rights to request a refund or insist that the unit be fixed to work within spec.

I agree that not all safety issues are based on HRT. But this also doesn't make the fact that being just slightly late causes a lot less of a "disaster" then being really late make the event a not Hard Real Time event. If being 1 ms late means it costs you $1000, but 1 s late cost you $1,000,000, there is REAL value after the deadline to complete, but it doesn't mean the deadline was "soft".

I TOTALLY disagree, if you are contracted to provide a system to met a certain standard, and to met that standard you must hit a deadline, that deadline is a Hard Real Time deadline. The fact that there are mitigation options that reduce the damage from failure, does not convert the problem to a "soft" domain. To do so would mean you might try to apply the wrong tools to the problem.

Hard Real Time systems need to be analyzed by guarantees. Can we PROVE that the deadlines WILL be met. This requires looking at total worse case paths. Any missing of the deadline is considered a failure. (So by the value function of success/failure, there is no value pass the deadline, but for other value functions, there may be).

Soft Real Time system need to be analyzed on more general performance basis. Individual steps might not function totally right, but, perhaps due to fault tolerance, the final results is still "good".

Going back to your pusher system, if the system was specified that it only needed to get 90% of the bottles off the belt, and that a few getting by were acceptable, then the problem, even though for a single event has 0 value on a miss, is a Soft Real Time System, as I no longer need to work on 100% guarantees, absolute worse case timings, etc, but on a more probabilistic model, needing statistical promises, not absolute.

Vote

Resource revocation

Join the Discussion

Didn't find your answer?