How to avoid a task not executed in a real-time OS?

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Tue, Jan 29, 2019 6:22 PM

How about "log error and enter a safe state"?

The _bugs_ do disappear :)

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Tue, Jan 29, 2019 6:50 PM

ome other code, which is ill-prepared to

Depends. For instance, in a clusterized EM simulation code from a dozen years ago, I have an assert-like macro called 'carefully'. A use example is in the simulation half-step that calculates the H field from the E field, which goes

carefully(HfromE());

That's typically fairly time-consuming--tens of milliseconds to tens of seconds depending on the size of the simulation and the size of the cluster.

The code has a tree-structured distributed supervisor scheme, a bit like a simple version of ganglia. So if some thread on some box finds a NaN or runs out of memory or something, I have to make sure all threads exit on all boxes, or the thing will hang forever waiting for the dead thread. (That's what CIS.CCom.SetStat() does.)

#define carefully( x ) Carefully( (x), __FILE__, __LINE__) ... // We can't just abort the simulation when we run into some problem // in a subsidiary thread--we need to tell thread 1 before ending // the thread // NB: Don't use this function in the main thread! // // This is called via the Carefully macro, which expands into // Carefully(somefunctioncall(),__FILE__,__LINE__);

void Carefully( int what, const char * fn, int Line) { if ( EMERROR_OK != what ) { if (quiet < 2) { char buf[512]; sprintf(buf,"%s(%d): Chunk error %d--dying....\n", fn, Line, what); pemerror(what,buf); } CIS.CCom.SetStat(dead); _endthread(); // fatal error--only destructors do // cleanup } /* End if */ };

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC / Hobbs ElectroOptics 
Optics, Electro-optics, Photonics, Analog Electronics 
Briarcliff Manor NY 10510 

http://electrooptical.net 
https://hobbs-eo.com

- C
- Clifford Heath
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Tue, Jan 29, 2019 10:45 PM

I know they're not. But if you're leading a team of 40 software "engineers" the best you can do is to make it easier to do things right. Poor error management is probably the biggest cause of user dissatisfaction over the entire history of the IT industry, and I did what I could to improve that, in my small corner.

No. It's forcing the programmer to think about the situations where the error might occur, and make decisions about how to notify the user (or the calling code) about the problem, the reason, and the possible solutions. I.e. avoid just saying "Unknown error 0x80000000".

Clifford Heath.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Wed, Jan 30, 2019 12:56 AM

Related, but slightly different...

A good feature of checked exceptions in Java is that they force the programmer to either catch an exception thrown by a "library function" or declare that it could be thrown to whatever calls this code. Thus the possibility of errors has to be explicitly addressed (even if, as I've seen, the exception is caught and ignored).

But that's "too much of a burden", so the modern practice is to only throw unchecked exceptions that aren't declared and checked by the compiler.

- R
- Reinhardt Behm
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Wed, Jan 30, 2019 5:32 AM

The hammer might be much too big. Would you like the engines in an aircraft to just stop because an assertion stopped he engine controller? I always have a lot of assertions or assertion like constructs in my code to make sure during development that any errors get catched. But production code is without any such things. They all get removed and then the _changed_ software gets verified again. How can you otherwise make sure that every code path is tested during software integration and verification? This testing and verification has tp be performed with the same software as will be deploeyed. The assertions should never fire, so you would have untested code in the deployed software.

--
Reinhardt

- A
- A.P.Richelieu
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Wed, Jan 30, 2019 6:07 PM

Den 2019-01-28 kl. 21:31, skrev StateMachineCOM:

Not really. As an example of an assertion which can be removed. We have an FPGA which contains registers. For various reasons, we may want to change the address or definition of registers. The FPGA tools + scripts will automatically generate a header file with register addresses.

In my code I want to access the FPGA registers as a struct.

I have assertions checking that the offset of each register in the struct matches the #defines in the automatically generated headers.

If all assertions are OK then recompiling without those assertions will not cause a problem in a production system.

If I however write the following code.

int getnumber(void) { c = getchar(); assert(c >= '0'); assert(c And also, relevant for the OP, are you really suggesting to leave the watchdog in the production code while disabling other assertions. If so, WHY?

The Watchdog allows you to reset or interrupt the processing, which can be used to invoke recovery functions, including logging the error, allowing you to resume normal operation, and also to pinpoint the problem later. A hanging device will make noone happy.

AP

- S
- StateMachineCOM
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Wed, Jan 30, 2019 8:31 PM

You guys cannot have it both ways.

If you truly believe that your production code is so good that assertions w ill NEVER fire in the field, then why are you so afraid of leaving them in the code? Of course, assertions cause some overhead, but they give you that last line of defense to do SOMETHING when the system goes out of control.

But you apparently ARE afraid that your assertions will fire too often in t he field. (Otherwise you would not be talking of "the hammer being too big" ). In this case, your solution is to disable assertions? And to do WHAT? Pr ay that the system will somehow miraculously recover and correct itself? Wh at are the chances for this to work?

This strikes me as backwards in the software business. Your code is either in complete control of the machine or it isn't. There is really nothing in between.

Who said that a failing assertion must always stop the system? In the end Y OU are designing the "hammer" (assertion handler), and so it is YOUR job to design it correctly for the circumstance: put the system in the fail-safe mode, whatever that means. And yes, I would prefer that an aircraft engine controller would reset and perhaps re-start the engine mid-air as opposed t o blow up the engine.

What you do in your development version is your business. If you choose to use "assertion-like" constructs, which aren't checking for conditions that should NOT happen (like array index out of bounds or de-referencing a NULL pointer), then of course you should remove them for production.

But by removing ALL assertions in the final code you are throwing the baby out with the bathwater. You lose your last line of defense to do damage con trol and to protect your system and all its users.

Miro Samek state-machine.com

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Wed, Jan 30, 2019 9:23 PM

< Attempt to generate controversy over an old article noted.> ;)

Afraid? Who's afraid?

assert() is too blunt an instrument to use as a general solution, because calling abort() or doing a hard reset is not necessarily a good defense in all situations. It might fit some situations, but it can make things worse in others.

A more nuanced assert()-like approach, such as the carefully() macro I posted upthread, is another matter. It was specifically designed to be left in the production code.

Because sometimes it is. We've supplied examples.

Straw man alert.

The choices are not limited to "leave all assert()s in production code" and "all the children will die!" There are lots of ways of doing runtime error checking.

For instance, unit tests are loaded with checks for correct program logic. Maguire talks about a MC68000 disassembler whose error checking system included a complete alternative disassembler (logic- vs. table-driven iirc). Once you've exercised all the code paths, you don't need the second disassembler.

There's a lot of daylight between wanting the software to be in control and issuing an edict forbidding #define NDEBUG.

In the event of failure, the C assert() macro calls abort().

'When I make a word do a lot of work like that,' said Humpty Dumpty, 'I always pay it extra.'

If by 'assertion', you merely mean 'appropriate error handling code hidden by a macro', we're in violent agreement. But to me at least, 'assertion' means the C library assert() facility or a close replica--something that applies the biggest available hammer--abort() or a hard reset in the event of failure.

Nobody is advocating for removing all error checking in production code. There are certainly lots of cases where abort() or a hard reset are the right response, but that's far from universal.

I use a whole lot of hard assertions in debug code, as I've said--more than I could stand in production, especially in an embedded system where code space is often at a premium. And of course since assertions have branches, they frequently mess up compiler optimizations.

The least-bad bugs are the ones you find by code reading; next are ones the compiler catches, third are the ones assert() finds; fourth are runtime error checks in production; and the worst are the ones that just make the system crash in some uncontrolled way.

Thus I like careful coding, static analysis, tight compiler flag settings, assert(), and carefully designed runtime error checking, in that order.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC / Hobbs ElectroOptics 
Optics, Electro-optics, Photonics, Analog Electronics 
Briarcliff Manor NY 10510 

http://electrooptical.net 
http://hobbs-eo.com

- A
- A.P.Richelieu
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Thu, Jan 31, 2019 5:05 AM

Den 2019-01-30 kl. 21:31, skrev StateMachineCOM:

Because assertions add code size and reduces performance.

If you keep an assertion which fires in the field, then you have a bug in your program. The bug is that you did an assertion, instead of validating data. You basically misused the assertion for something it was not designed to handle.

I do, assertions should only continue execution if it would be dangerous to stop.

Since assertions are supposed to be removed in production code, your design is flawed. You used assertions, instead of run-time checks.

Learn the difference between assertions and run time checks.

AP

- K
- Kent Dickey
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Sat, Feb 2, 2019 4:44 PM

You need to realize your assertions can have bugs just like your code. In fact, I'm sure the bug rate for assertions is at least an order of magnitude higher than for other code, possibly several orders of magnitude.

Say you have a routine which returns a pointer to a structure, and a size of that structure (so it can get bigger in the future).

struct Thing *my_thing; ret = get_a_thing(&my_thing, &size); assert(size == sizeof(Thing));

In all of your testing, this will succeed, since Thing will match the size being returned for now.

But then an upgrade occurs, and the size returned is bigger. Now, the code would work fine (the interface is designed to support this by adding new fields in the future), but now your assert goes off. This is bad--you basically have made your code fail on a system upgrade.

This is a trivial example, but it gets at the issue--asserts can test for finicky details which actually don't matter. And for tesitng, this is fine, actually probably a good thing, since it makes code assumptions clear.

You even point out that assertions make the code less robust--you're more likely to crash. And you expose yourself to crashing in cases where ignoring the "error" would be innocuous. I get very frustrated at old programs which now fail due to stupid checks which they should not be doing. I mean, I'd prefer a program to keep running and think it was the year 1900 then to hard fail once year 2000 hit.

Plus, the dirty secret no one wants to say out loud--buggy code that keeps running is often more useful than code which "stops". We've got a lot of experiential evidence that pretty massive bugs can often be worked around to keep a system running. One downside, of course, is buggy code is often a giant security hole. I don't think there's a magic wand which easily balances the need to make the code useful and robust, and secure.

So, what people are clearly saying is: you need at least 2 levels of assertions, and the assert I gave as an example above definitely should not be in production code (but is actually fine in your test builds). In fact, I sometimes use assertions for things I don't know are true, just to see if testing hits the case. These should not be in production code.

Kent

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Sat, Feb 2, 2019 5:19 PM

I think the religious war over assertions is mostly due to sloppy use of language, specifically whether "assertion" means "the C assert() macro or some near relative that calls abort() or does a hard reset" or "appropriate runtime error checking". All of our positions are closer than they appear.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC / Hobbs ElectroOptics 
Optics, Electro-optics, Photonics, Analog Electronics 
Briarcliff Manor NY 10510 

http://electrooptical.net 
http://hobbs-eo.com