How to avoid a task not executed in a real-time OS?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi,
I was asked the question in the title some time ago. I had some real-time
embedded system experience but with RTOS. A watch-dog can avoid a task not
called in a real-time system. But in a RTOS, what is the right answer for it?
I knew task priority, a timer triggered event all can influence a task execution.
But they didn't look like the correct answer. What is the right answer do you
think?

Best Regards,

Re: How to avoid a task not executed in a real-time OS?
Quoted text here. Click to load it

It's a peculiar choice of words in asking these questions.  I'm not sure  
exactly what you mean by "avoid".

In a typically designed single core system, at least one RTOS task is  
always present (blocked or waiting) but not always actively doing work  
(running).  There is always more than one task, otherwise there is no  
justification for using an RTOS.  The programmer can (usually) start  
(create) and stop (kill) tasks.  Task creation usually happens at init  
time but there's no rule the forces that design.  If you want to avoid  
the task running state, kill the task or inhibit the input to unblock  
it.  You decide as the programmer.

Task priority is a means to direct a lower priority task to yield CPU  
usage while a higher priority task is running.  When the high priority  
task is done running (that is, it blocks), the lower priority task  
automatically runs if it's runnable (not blocking).  Task priority can  
delay a task running in response to what unblocked it but not inhibit it.

That's a simplified answer.  Some RTOS's are quite sophisticated and the  
answers get complicated when you use their advanced features.

JJS

Re: How to avoid a task not executed in a real-time OS?
On 2019-01-21 Robert Willy wrote in comp.arch.embedded:
Quoted text here. Click to load it

A watchdog can not avoid a task being not called. In most cases the
watchdog just resets the complete system if it is not serviced fast
enough (by the task(s) being watched). There is however no guarantee
that the offending task will run after the reset. There may be some
permanent failure that prevents it from running.

You can use a watchdog in an RTOS as well, but you have to make sure
your task will run within the required watchdog service interval.

What do you think the 'RT' in 'RTOS' stands for? What is the difference
between your 'real time' and using an RTOS?


--  
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

It is better to have loved a short man than never to have loved a tall.

Re: How to avoid a task not executed in a real-time OS?
On 1/21/19 7:40 AM, Robert Willy wrote:
Quoted text here. Click to load it

If you question is how to make sure that every task gets the time it
needs within it required time limits, in general this is impossible, as
if you mis-design your system, you may end up with 110 units of work to
do in 100 units of time, which is fundamentally unsolvable.

Different RTOSes have different scheduling methods, mostly in a few
standard technologies.

If you have an RTOS with strict execution priorities, then the only
thing that can keep a current task from getting the CPU to try and meet
its deadlines are task with higher priorities (or there is a case where
lower priority tasks might hold resources needed). If you can
characterize and limit the time that those higher priority tasks might
consume, then you can come up with some gaurentees for execution of that
task.

Another method sets priorities based on how close the task is to its
deadline, which as long as no other tasks have fallen behind their
deadline (if past deadline is given priority) should get some time as
the deadline approaches.

You can also create ad-hoc solutions where some high priority task
periodically adjusts priorities to try and make sure a given task gets
the time it needs (normally an indication that something wasn't designed
right the first time).

Re: How to avoid a task not executed in a real-time OS?
On Mon, 21 Jan 2019 13:05:15 -0500, Richard Damon

Quoted text here. Click to load it

There are some simple rules of thumb that I have used successfully for
decades.

1.) Analyze your task and find out which priorities can be _lowered_
without harming the total system performance. In this way, the few
higher priority tasks should have plenty of execution time, without
constantly fighting with low priority tasks for CPU time.

2.) The higher the priority, the shorter the execution time should be.
Treat the highest priority tasks like "pseudo interrupts". Move any
non-hard-RT functions into the null task, which can consume all CPU
time after all high priority tasks have ben served.

3.) If some task takes too long to execute for its priority, consider
splitting the functionality into two task, less time critical things
to a lower priority task (or even null task) and the essential things
into a small high priority task.

4.) Avoid uncontrolled resource locking. If some resource needs to be
locked, use a dedicated high priority transaction handler task with
well defined execution time to handle the transaction from beginning
to end.

Quoted text here. Click to load it


Re: How to avoid a task not executed in a real-time OS?
Am 21.01.2019 um 13:40 schrieb Robert Willy:

Quoted text here. Click to load it

No, it really can't.  It can only _react_ to a supervised task not
having been called in time.  But it cannot keep that from happening in
the first place, i.e. it cannot "avoid" it.  Nor can it usually
guarantee that a bite of the dog will actually resolve the issue.

Quoted text here. Click to load it

Quite probably the same, because there is not really a difference
between "real-time system" and "RTOS".

Any sufficiently badly designed application can fail to perform some
task before its allotted deadline.  If that renders the performance of
said application unacceptable, that means the buzz-word "real-time" has
been applied to it correctly --- quite often it isn't.

Nor is there such a thing as "the" right answer to the question "what
should I do if this happened to me?"  Like for all somewhat interesting
questions, the only universally applicable answer is "It depends."

In a perfect world the answer might be "Just make sure your system has
sufficient resources in all areas that this simply cannot happen, and
then some" --- but in most corners of this here reality bean-counters
will tell you, in excruciating detail if you insist, why that answer is
completely not incorrect, but actually blasphemous.

Re: How to avoid a task not executed in a real-time OS?
On 21/01/2019 13:40, Robert Willy wrote:
Quoted text here. Click to load it

I have heard the use of watchdogs as being like hitting a dead man on
the head with a hammer in the hope that it will wake him.

Watchdogs with reset can be useful if you have hardware issues - dodgy
power supplies, radiation, or something that has a risk of giving an
unexpected one-off glitch that stops the system working properly.  If
the problem is in software, however, it will just lead to the same
situation again and again.

For software issues, watchdogs are more about helping you identify and
debug problems - they don't fix anything, they just let you know you
have a problem.

But in no sense does a watchdog /avoid/ a task not doing its job.  At
best, it can help you see that the task has not succeeded.  There are
usually better ways (RTOS or not) to do this, if you think it is necessary.

The only way to avoid a task not being called is to be sure that you
design the system properly.


Re: How to avoid a task not executed in a real-time OS?
Am 22.01.2019 um 09:19 schrieb David Brown:
Quoted text here. Click to load it

That's overstating it teeny little bit. ;-)

It's more like hitting a newly dead heart with a hefty jolt of
electricity to possibly make it restart --- a procedure that is quite
definitely not recommended to be used on a non-dead one.

And just like the bite of a watchdog, that sometimes actually does work.

Quoted text here. Click to load it
That's by no means certain.  It all depends on how it happened that the
software got itself stuck in a situation that didn't occur during
testing (or the software would never have been released into the wild,
right?)  But somehow, right now it did.

If a e.g. once-in-a-blue-moon "forbidden" excession of design
limitations on some input was the reason, a watch-dog reset cures the
problem until the next event of that kind --- i.e. possibly forever.

Re: How to avoid a task not executed in a real-time OS?

 wrote:
Quoted text here. Click to load it

I just returned from a presentation by Don Eyles about the Apollo missions.
He discussed how the watchdog kept resetting the LEM computer during the
Apollo 11 descent. Even more scary was the Astronaut-hand-applied patch
during Apollo 14 to circumvent an intermittent short in the LEM "Abort"
button. Don't get too concerned, just press on...

Looking forward to reading his memoir (had to buy a copy)!
https://www.amazon.com/Sunburst-Luminary-Apollo-Don-Eyles/dp/0986385905

See ya, Dave

PS: Now, how's THAT for thread drift?

Re: How to avoid a task not executed in a real-time OS?

Quoted text here. Click to load it

Perhaps.  But like a defibrillator, the watchdog does nothing to deal
with the actual cause of the problem.

Quoted text here. Click to load it

We can say without doubt that the watchdog does not cure the problem.
If software causes a hang that triggers the watchdog, there is a bug in
the software.  That applies regardless of how it happened, how good or
bad testing you had, what the input values were, etc.  (Note that if
something exceeded /specified/ design limitations, then that is outside
the realm of the software.)  Since the watchdog does not magically fix
the software, the problem remains.

Clearly, not all systems need the same level of quality, reliability,
and robustness.  You don't design and test your "amusing" singing
birthday card to the same levels as you do for your submarine control
system.  And so sometimes, a watchdog reset on software hang is a good
enough way to handle the symptoms of some kinds of software bugs.  You
balance the cost of the unreliability against the cost of fixing it -
engineering is about making things good enough, not perfect.

But you do need to be aware of the watchdog actually does, and does not
do.  Some developers use it as a crutch to avoid the effort of writing
correct code, or testing appropriately.  "If there is an error on the
communication line, it will lead to a timeout - the watchdog will reset
the system, so that's fine."  "The tasks will only have a conflict and a
deadlock if the user presses the button at the same time as the screen
is updating - that is unlikely to happen, and the watchdog will fix it
if it does".  Some use it as a crutch to avoid debugging and fixing
problems.  "The software hung during testing, but the watchdog restarted
it fine.  We don't think it will happen at the customer's site."

Or "A watch-dog can avoid a task not called in a real-time system", as
the OP claimed.


So that does /not/ mean I don't recommend a watchdog (though frequently
I do not enable them - I'd rather the customer reported the problem so
we can fix it properly).  You just have to know /why/ you have a
watchdog, and use it appropriately.

Re: How to avoid a task not executed in a real-time OS?
Quoted text here. Click to load it

Obligatory:

https://www.washingtonpost.com/news/morning-mix/wp/2017/09/25/the-navys-adding-a-new-piece-of-a-equipment-to-nuclear-submarines-xbox-controllers

Re: How to avoid a task not executed in a real-time OS?
On 23/01/2019 22:13, Paul Rubin wrote:
Quoted text here. Click to load it

We used a Playstation controller for a whole submarine, not just a
periscope.  (It was an ROV - remote operated vehicle.  So no people in it.)


Re: How to avoid a task not executed in a real-time OS?
On Mon, 21 Jan 2019 04:40:38 -0800 (PST), Robert Willy

Quoted text here. Click to load it

The "RT" in RTOS stands for "real time".


A watchdog generally tells you that some task either was not run, or
was not run on time.  It knows this because the task resets the
watchdog's timer as part of its operation.  If the watchdog expires,
you know that the task, for whatever reason, failed to reset it.

So the task running properly avoids the watchdog expiring.


From another point of view: a watchdog could be designed to start a
particular task when it expires.  In this case execution of that task
could be avoided by something else continually resetting the watchdog.


If neither of these answer your question, then I don't understand what
you're asking.

George

Re: How to avoid a task not executed in a real-time OS?
A watchdog timer is really a hardware-assisted, time-based assertion in the
 code. As such, it is just a part of the larger software development strate
gy known as Design by Contract (DbC).

The value of identifying the watchdog timer as an *assertion* is that it in
forms you what to expect from it. For example, you can't expect an assertio
n to "avoid" or "fix" a problem (like in the OP "avoid a task not executed"
). This is because assertions neither handle nor prevent errors, in the sam
e way as fuses in electrical circuits don't prevent accidents or abuse. In  
fact, a fuse is an intentionally introduced weak spot in the circuit that i
s designed to fail sooner than anything else, so actually the whole circuit
 with a fuse is less robust than without it.

Now, regarding using watchdog timers in the context of an RTOS: you should  
service the watchdog from the context of the task. A common mistake is to s
ervice a watchdog from a periodic timer service. RTOS timers typically run  
in the ISR context, so they might be running and being serviced, while the  
task is starving. Another mistake along these lines is to service a watchdo
g from various RTOS callbacks, also known as "hooks", which might also run  
in a different context than your task.

Once you use a watchdog timer, you need to carefully design (and test!) the
 behavior of the system when the watchdog expires. Here again, identifying  
the watchdog as an assertion helps, because you can use your general strate
gy of handling failed assertions. I've written more about this in the blog:
 ["A nail for a fuse"](https://embeddedgurus.com/state-space/2009/11/a-nail
-for-a-fuse/).

I am always amazed by embedded designs, where developers go to great length
s to apply memory protection (MPU or MMU) or watchdogs, while at the same t
ime they don't sprinkle their code with basic code assertions that perform  
rudimentary sanity checks.

Even more bizarre to me is when developers use assertions, but *disable* th
em in the production release (while keeping the MPU and the watchdogs.) I'm
 sure the readers of this forum never do such an illogical thing, and alway
s ship the products with carefully designed assertions, right?

Re: How to avoid a task not executed in a real-time OS?
Den 2019-01-28 kl. 16:17, skrev StateMachineCOM:
Quoted text here. Click to load it

Assertions are there to check that your code is sane.
They are designed to be removed in production code.

Assertions are not the same thing as checking your input.
You definitely need to check your input, but once validated,
they do not need revalidation.
if the input is not valid, an intelligent handling/recovery of the  
erronous output is preferred over some rough action generated by an  
assertion failure.


Re: How to avoid a task not executed in a real-time OS?
Quoted text here. Click to load it

Absolutely. You need to very carefully distinguish between the erroneous be
havior (a.k.a. bug) and exceptional condition, which is rare but can arise  
legitimately. Assertions are for errors. I've written specifically about it
 in the Dr.Dobb's article "An Exception or a Bug?" [http://www.drdobbs.com/
an-exception-or-a-bug/184401686 ]

Quoted text here. Click to load it

I'm exactly challenging this beaten-path point of view, because it suggests
 to stop checking the sanity of the production code. This would work if *al
l* errors are completely removed during debugging. Are they really removed  
in YOUR code?

And also, relevant for the OP, are you really suggesting to leave the watch
dog in the production code while disabling other assertions. If so, WHY?

I'm looking forward to interesting discussion...

Re: How to avoid a task not executed in a real-time OS?
On 1/28/19 3:31 PM, StateMachineCOM wrote:
Quoted text here. Click to load it

A generally very sensible article.

I'm all for having error checking in production code, but I don't call  
those 'assertions'.   I don't like the idea of leaving _assertions_ in,  
though, because (a) abort() or a hard reset is a mighty big hammer to  
apply that broadly, and (b) it deprives me of a very useful facility for  
debugging, because I can't use as many of them as I want if they all  
have to be left in the production builds.

I have a few macros like yours that supply a finer-grained set of options.

Cheers

Phil Hobbs


--  
Dr Philip C D Hobbs
Principal Consultant
We've slightly trimmed the long signature. Click to see the full one.
Re: How to avoid a task not executed in a real-time OS?
On 29/1/19 8:22 am, Phil Hobbs wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

We had a set of assert macros that would abort in the test environment,  
but return an error code when run in production so the caller needed to  
explicitly ignore or handle the error condition. That gives you proper  
feedback during testing but proper error handling in prod.

Clifford Heath.

Re: How to avoid a task not executed in a real-time OS?
On 1/28/19 6:17 PM, Clifford Heath wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

I'm talking mostly about things like enforcing class invariants and so  
on.  Putting those in inline functions, for instance, can be a big  
performance and code size hit, and once testing is done, you can be  
pretty sure they won't fire in production.

Memory corruption, null pointers, deadlocks, etc. definitely have to  
have run time checks.  So it's nice to leave assert() for debug and roll  
your own macro set for runtime.  That way you can have the fault  
tolerance of defensive programming without hiding bugs.  (Maguire is  
still a good read.)

Most of my code is embedded or else console-mode simulations, so I don't  
really do a lot of error recovery.

Cheers

Phil Hobbs

--  
Dr Philip C D Hobbs
Principal Consultant
We've slightly trimmed the long signature. Click to see the full one.
Re: How to avoid a task not executed in a real-time OS?
Quoted text here. Click to load it

Seriously? Do you really believe that the error codes are checked and proper actions taken in *all* cases? Isn't this just kicking the can down the road and into some other code, which is ill-prepared to "handle" your bugs?

Quoted text here. Click to load it

I'm not sure what you are proposing by "rolling your own" for production code. What those "other versions" of assert macros in production code are supposed to do?

For the OP, what is your advice specific to watchdog timers? Would you switch the watchdog off for production code? In that case, is it worth to implement a watchdog only for debugging?  

On the other hand, if you recommend keeping the watchdog in production code, why you choose watchdog and suppress other assertions? What's so special about watchdog and what should be done when the watchdog expires in production code?

The main point remains: Bugs don't miraculously go away just because you stop checking for them. Do they?


Miro Samek
state-machine.com

Site Timeline