"Am I still working okay?" asked the micro controller...

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Say we have a micro controller with limited memory.
Say  it will perform some realtime control of something.

How to make a SW for a micro controller, that in addition to its normal
operation (control of something), from time to time it will also check
itself if it is doing okay or not ?  How a program can test itself?  Can
some one suggest any intelligent method (other than watch dog) ?



Re: "Am I still working okay?" asked the micro controller...

"SelfTest" <SelfTEst> wrote in message
Quoted text here. Click to load it

That's called a 'watchdog' timer and is standard in most microcontrollers.
It's basically a countdown timer which the computer program running on the
microcontroller needs to set every x times per second to prevent it reaching
zero. When it reaches zero the microcontroller is reset. So when a program
'hangs' the program stops setting the watchdog countdown timer and the
microcontroller is reset.





Re: "Am I still working okay?" asked the micro controller...
[OP forgot to limit F'up2; fixed. Removed non-existant c.a.e.piclist
from Newsgroups:]

In comp.arch.embedded SelfTest <SelfTEst> wrote:

Quoted text here. Click to load it

Ultimately, you can't.  A CPU can no more meaningfully ask itself "Am
I still working OK?" than you can ask yourself meaningfully "Have I
fallen asleep yet?"

You can use watchdogs or internal consistency checking to some extent
to determine general health of the software.  Assertions can be
inserted into the code, i.e. conditions that you know must come out
true at all times, because otherwise something's fatally wrong.

But there's often little or no point trying to detect hardware faults
--- if the hardware does break you're quite probably toast anyway.
You can't usually fix such a problem from the software side, and by
The Usual Kind of Luck, the faults that do occur will be exactly those
you can't, or at least didn't test for.  And that's before you
consider that such tests mean more code in total, and thus more
opportunities for bugs.

Morale: if you don't know what to do with the answer, don't ask the
question.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

Hans,

In this case, the very next question should be
Moral: if you don't know how the answer [i.e. the sensor/hardware] could
fool, don't ask the
question.




Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

There are plenty of simple things you can consider if something is failing.
1) Turns yourself off, no need to draw power if you are battery operated.
2) Turn off any external device, which should not operate when the program
    is not active
3) Reset yourself.
    If it is not OK, due to a temporary problem, this is quite good.

Quoted text here. Click to load it


--
Best Regards,
Ulf Samuelsson   ulf@a-t-m-e-l.com
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

"SelfTest" <SelfTEst> wrote in message
Quoted text here. Click to load it

One way to check hardware is to run another identical processor and compare
that they  behave the same. If you have three or more then you can perform
voting so that the most popular answer is the one that gets used.

Peter


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com ).
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it
compare

That is cool idea !..



Re: "Am I still working okay?" asked the micro controller...

<jiang> wrote in message
Quoted text here. Click to load it
perform
It's not my idea, NASA uses a set of five computers for the Space Shuttle
flight software.

Peter


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com ).
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

And not so simple.  What takes the vote?  What if it fails?

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

Use mechanical or pneumatic voting, not electric.

For instance, if you want to control a bidirectional relay, use a core
with three separate coils, each controlled by a separate processor. If
the current in two coils flow in opposite direction, the resultant
magnetic field is zero. Then the third coil will determine the
resultant force alone.  

Paul
 

Re: "Am I still working okay?" asked the micro controller...
On Wed, 19 May 2004 21:12:52 +0300, the renowned Paul Keinanen

Quoted text here. Click to load it

On one machine I'm very familiar with there are three safety
interlocks (one electrical (not electronic), one hydraulic, and one
mechanical). Only when all 3 agree it is safe is the electronics
allowed to do what it wants.

Best regards,
Spehro Pefhany
--
"it's the network..."                          "The Journey is the reward"
snipped-for-privacy@interlog.com             Info for manufacturers: http://www.trexon.com
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

I worked on an aerospace actuator that did it like this:

Three hydraulic actuators have three electronic control systems.

Each actuator monitors the other two and has two outputs that
are at +5V if it thinks that actuator is good, -25V if it
thinks that actuator is bad.  The actual monitoring consists
of challenges/responses through six dual-redundant actuator-
to-actuator digital communication links and looking at extra
pressure transducers on the monitored actuator that are read
by the monitoring actuator.  This identifies wrong behavior.

Each actuator has an input that connects to the outputs of
the other actuators through two resistors that form a summing
junction.  If the sum is > -5V, it operates normally.  If the sum
is < -5V, it goes into "freewheeling mode", where it exerts no
force and is easy to move.  If one or both of the other actuators
asserts -15V it freewheels.

Each of the two resistors mentioned above is actually a pair of
resistors in series.  The summing junction also has a pair of
high-value resistors in series to local common to hold the input
at 0V in the case of two open input signals.

One actuator can drag along two freewheeling actuators and
control the aircraft.

Two actuators working together can drag along a third actuator
that is trying as hard as it can to go the other way and control
the aircraft.

Result: no single point of failure in the actuator electronics
or voting system can result in loss of control of the aircraft.


--
Guy Macon, Electronics Engineer & Project Manager for hire.
Remember Doc Brown from the _Back to the Future_ movies? Do you
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

Let me guess, it was too heavy to fly?  ;-)

--
Ben Jackson
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

Judge for yourself:

http://www.fas.org/man/dod-101/sys/ac/c-17.htm

:)

--
Guy Macon, Electronics Engineer & Project Manager for hire.
Remember Doc Brown from the _Back to the Future_ movies? Do you
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

No system can be 100% reliable. All that matters is to get the level of
reliability required by the application.
Generally, the voter mechanism is designed in order to be far more reliable
than the other part of the system (reliability even better than the
resulting reliability of the voting algorithm).

Marc


Re: "Am I still working okay?" asked the micro controller...

"SelfTest" <SelfTEst> wrote in message
Quoted text here. Click to load it
You are correct in identifying watchdog timers as one form of COP (computer
operating properly test).  Other things I've often used are:
1)    Background checksum on code and constant/initializer areas of memory
2)    Flags and timers which indicate that critical routines and interrupts
are running at about the right rate, usually checked in the watchdog timer
interrupt.
3)    Guardwords between stacks and other memory and regular checks that
these have not been compromised (agail often in the watchdog timer
interrupt.
4)    Feedback of critical output signals to ensure the hardware is working
correctly (the hardware is much more likely to suffer random failures than
the software).
5)    A decent watchdog timer with an algorithmic stimulus and response
(e.g. watchdog processor supplies a pseudorandom number and main processor
replies with next pseudo-random number in a sequence).  Much better than the
primitive kick within a certain time style of watchdog, which is prone to
failure to detect runaway software which includes a kick.
6)    One I haven't used but seen used on a critical plc style system is an
odd number of redundant processors (3 in this case) which vote on the state
of an output (output follows the state of two agreeing inputs).

Of course, the next question you should ask is "What do I do when I detect a
failure".  If it is a safety critical system (e.g. the something you're
controlling is a train, nuclear reactor or gas furnace rather than a lego
windmill) there's a whole other set of questions you should ask even before
asking the first one.

hth,
Alf




Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it


..and adding to that list. External Pulse Maintained relay. This device has
to be fed a change of polarity of its input signal at a regular rate in
order for it to maintain a relay in its energised state. If any single
component fails, the power supply goes off or the input does not change
then the relay just de-energises and opens its contacts. The pulse drive
for such a circuit should be driven from the processor internal sanity
checks that your software is performing (all check OK so change the state
of the output). This device can elevate a single processor from SIL0 to
SIL1 with very little effort.


Further, your microcontroller may be comunicating with other systems in
order to perform its control. Doing sanity checks on the communication link
and checking its integrity in operation will yield a good idea of
sub-system health. You will need checksums and/or CRC's on all messages
between systems.

Integral step-wise walking memory test and other walking sanity checks.
This can detect potential failure points quite early on.

There are a number of others.

 
Quoted text here. Click to load it


You should do an evaluation of what the system safe state is going to be
(off, bypassed or gracefully degrading). Then your design efforts should
always lean the system toward achieving those safe states unless it is
continuing to work properly.


--
********************************************************************
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it
have a look at
http://www.embedded.com/story/OEG20030115S0042
There seems to be a lot to getting just a little old WD bullit proof





martin

Three things are certain:
  Death, taxes and lost data.
    Guess which has occurred.

Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

Without special hardware support, you can't.

Quoted text here. Click to load it

It can't.


Redundant hardware running independantly developed sw with
majority voting of outputs.

--
Grant Edwards                   grante             Yow!  I HAVE a towel.
                                  at              
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it
You also need to consider the likelihood of a problem occurring in the first
place - time spent
designing the hardware to be reliable (e.g. EM/ESD immunity) is time much better
spent than trying
to second-guess what might go wrong and then hope you can do something useful
about it.

For example, in the old days when systems typically comprised  seperate
MCU/RAM/ROM chips, it made
sense to test SRAM and checksum ROM, as these involved many interconnections and
sockets which could
fail. It makes much less sense to do it on a single- chip MCU, where the sort of
failures that are
plausible on a seperate-chip system just don't happen.


Site Timeline