"Am I still working okay?" asked the micro controller... - Page 3

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

What a pile of bullshit.
There are more reasons for an embedded system to fail that you
can even begin to imagine. Not using watchdogs (in a sensible
way, of course) is totally irresponsible in my opinion.

Re: "Am I still working okay?" asked the micro controller...

Jack Gannsle wrote a GREAT article on why you should use watchdogs, and why they
are so tricky to use properly.

http://www.ganssle.com/watchdogs.htm




--
- Alan Kilian <alank(at)timelogic.com>
Director of Bioinformatics, TimeLogic Corporation 763-449-7622

Re: "Am I still working okay?" asked the micro controller...
I had already read most points he talks about in other articles,
but this is great nevertheless.

Anyone with a concern for safety and reliability should read this -
and then some.

Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

There is a lot of interesting detail about space-craft software and
the claim that a WDT could have saved the mission is no more or less
true than fixing the original floating point exception that caused it.

The article then gives an example of crashing cooker-hood-fan firmware
and assumes the WDT had *not* been used. He cannot know this. If the
firmware is poor, then the WDT was likely poorly implemented too.

Here is a quote from the article:-

<start of quote>
"Well-designed watchdog timers fire off a lot, daily and quietly
saving systems and lives without the esteem offered to other, human,
heroes. Perhaps the developers producing such reliable WDTs deserve a
parade. Poorly-designed WDTs fire off a lot, too,sometimes saving
things, sometimes making them worse."<end of quote>

I disagree. When the WDT fires, it is a disaster that needs fixing and
if it goes off "a lot" and especially "quietly" it is a cover-up where
the developers *should* be paraded.


Cheers
Robin

Re: "Am I still working okay?" asked the micro controller...
On 27 May 2004 07:04:12 -0700, the renowned snipped-for-privacy@tesco.net

Quoted text here. Click to load it

You don't understand.

Best regards,
Spehro Pefhany
--
"it's the network..."                          "The Journey is the reward"
snipped-for-privacy@interlog.com             Info for manufacturers: http://www.trexon.com
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it
... snip ...
Quoted text here. Click to load it

Here is a counter-example.  The hardware is operating in a noisy
environment.  This induces dropped bits, etc.  The software can
handle most of the data errors, but has a few problems when the IC
is altered and it is driven off to executing random data.  Time
for the three fingered salute, administered by the faithful hound.

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

   Let me "requote" some of that, so I can respond to it here:

Quoted text here. Click to load it

   Putting the discussion of WDT's aside for a moment, I find it
inexcusable (engineering-wise) that such a simple application as the
cooker-hood-fan would crash or fail (maybe in development, but
certainly not in production), whether it's from (a) firmware bug(s) or
susceptibility to static discharge.
   OTOH, I can see where a marketing person might play with it for two
minutes (before adequate testing is done), declare to management in
the heat of time-to=market pressures "It works, let's ship it" and a
bad/untested design goes out the door, perhaps even over the
protestations of the person(s) who designed it.

Quoted text here. Click to load it

   WDT's ARE valuable, but certainly not for the reasoning given
above.
   What it SHOULD have said (IMHO) is:

   Well-designed watchdog timers in well-designed systems RARELY if
EVER fire off, but like an airbag and seat belts in a car accident,
when they do fire off they save systems that would otherwise, perhaps
literally as well as figuratively, be "lost in space."

Quoted text here. Click to load it

   I certainly agree that WDT's should RARELY if ever fire. It helps
to have it turned off for general development, but there should be a
testing time where it's on (and the timer reset point should of course
be carefully thought out as part of the design), and any reset
generated should be investigated for its cause (this is where an
emulator and logic analyzer are really worth their rental fees) and a
correction put into place.
   I've read and enjoyed some of Jack Gannsle's articles before, but
Robin points out very well that Jack misses the mark on this one. Has
anyone emailed him about this thread yet?

Quoted text here. Click to load it

   This is an example where the hardware isn't shielded well enough
from the environment, or isn't robust enough or rad-hard enough to
operate reliably in the environment. Fix that, then go for long-term
testing to see of the WDT ever fires.

   Having a WDT reset the hardware doesn't make a system reliable. It
is only a protection against rare, worst-case conditions. And I mean
TRULY rare conditions, not "rare" as the word is (ab)used on eBay.

   Here, I'll frame it for you. Print it, cut it out and paste it on
your monitor:

  _________________________________________________________________
 /                                                                 \
|  Having a WDT reset the hardware doesn't make a system reliable.  |
 \_________________________________________________________________/

-----
http://mindspring.com/~benbradley

Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it
... snip ...
Quoted text here. Click to load it

I am glad you have unlimited funds to spend on your productions.
A few pounds of lead around the system is always welcome, and
encourages sales.  Some of us believe in engineering the product
to fit the desired use.

--
fix (vb.): 1. to paper over, obscure, hide from public view; 2.
to work around, in a way that produces unintended consequences
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

It appears that you are thinking that the proper way to design a
product is to make a complete product and then start to wonder how to
get it through the EMC and other tests and hoping that a ferrite bead
there and a bypass capacitor will solve the problems. Then you spend a
lot of time trying, usually with several iterations, to get the device
just pass the test and still wonder about random lockups and justify
the use of the WDT.

EMC design should be part of the whole design cycle. You should design
the RF filter return paths and static electricity discharge paths so
that it does not go through any sensitive areas, since the tracks will
have a significant inductance and thus have a high reactance (or even
resonate) at high frequencies or generate quite a high voltage, when a
high current from a static discharge passes through it. This does not
necessary cost very much as a whole, since it is done in the design
phase.

A metallic (or at least conductive) box may also be required or
require extra ground planes on the PCB, this of course may cost some
extra, but reduce support cost in the field.

A system designed for good EMC performance should also be quite immune
to "unexplained" crashes or lockups and thus reduce the need for WDT.

Quoted text here. Click to load it

"Desired use" seems to be get the product sold, but not care, if the
customer has to throw it away as useless. Just wondering, if the
customer is going to buy anything else with the same brand name in the
future. I am glad that the CE requirements removed at least some the
worst trash from the European market.

Paul


Re: "Am I still working okay?" asked the micro controller...

Quoted text here. Click to load it

Protecting the hardware is not really a costyly exercise. Most of the
time it involves little more than appropriate filtering of the inputs,
maybe a thin metal can over sensitive circuitry, using metal boxes
instead of plastic ones. Look at it as developing boxes within boxes
and using appropriate barrier techniques at the barrier boundaries.
The total cost can often be less than not doing these simple things.

--
********************************************************************
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

Lead? You're afraid of cosmic rays? Is not magnetic induction more of a risk?

Robin

Re: "Am I still working okay?" asked the micro controller...
On 28 May 2004 08:45:14 -0700, snipped-for-privacy@tesco.net

Quoted text here. Click to load it

   Whatever the cause of the problem, a WDT won't fix it, though it
may cover it up for a while.
   I suspect CB was angered that I pointed out a flaw in his
counter-example, so he came back with something mean-spirited. I
didn't mean my response as a personal attack, but this is Usenet and I
can't take responsibility for how others read my posts.

Quoted text here. Click to load it

-----
http://mindspring.com/~benbradley

Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

Hardly.  The particulars do not matter.  The point is that,
whatever the product, there is a limit to the practical production
cost.  You need the best bang for the buck.  Random external
events may require prodigious efforts to block.  You, not I,
brought up radiation shielding, and I only mentioned a means of
blocking such.  (To robin: cosmics are only one of a wide range of
radiation extant.  They are extremely hard to block.)

You need to face reality, in that something is going to fail.
When it does, you need a means of avoiding further damage and/or
effecting recovery.  If you think you can build anything that is
failure, damage, and idiot proof you have delusions of grandeur.

--
fix (vb.): 1. to paper over, obscure, hide from public view; 2.
to work around, in a way that produces unintended consequences
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...


Quoted text here. Click to load it

...or you are in management. "Our company policy states that
all of our products are failure, damage, and idiot proof."

--
Guy Macon, Electronics Engineer & Project Manager for hire.
Remember Doc Brown from the _Back to the Future_ movies? Do you
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

   As colleague DW said, " ... idiot proof.  It proves we're
idiots."  He was kidding, of course.

        Regards.        Mel.

Re: "Am I still working okay?" asked the micro controller...

"Guy Macon" <http://www.guymacon.com skrev i meddelandet
Quoted text here. Click to load it

Remember reading the warrantly clause like this:
"The only guarantee you'll get from us is that eventually all our equipment
will fail."

--
Best Regards
Ulf at atmel dot com
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
Quoted text here. Click to load it

Well, you have me there, I can only think of four (ignoring <hardware failure>):-

<firmware bug>
<spontaneous alpha particle emmission>
<brown-out>
<lightning strike>


Cheers
Robin

Re: "Am I still working okay?" asked the micro controller...
On 27 May 2004 01:59:00 -0700, snipped-for-privacy@tesco.net

Quoted text here. Click to load it
failure>):-
Quoted text here. Click to load it
The causes could be numerous - static discharge (not just the effects
of lightning strikes), radio interference, other forms of radiation,
electrical shortages due to fluid spillage, inappropriate scope of
device usage (I don't consider it a software bug here) --- all these
faults could leave the device in a state where the software can't run.

The reason that it is used in the medical field is that it provides a
cost-effective mitigation for many ailments. Designing equipment to
operate in a room full of X-Ray, MRI, etc equipment - some dating back
a few decades, can be a very daunting exercise. Of course there is a
minimum standard EMC requirement that medical equipment conform to.

Also I disagree with the notion that using a watchdog "advertises"
some deficiency of the device (paraphrasing here). For me it's use
does suggest that the developer's have applied due diligence and have
used it as a mitigation against faults which they've arrived at
through some analysis.

Ken.

Quoted text here. Click to load it


+====================================+
I hate junk email. Please direct any
genuine email to: kenlee at hotpop.com

Re: "Am I still working okay?" asked the micro controller...

: Well, you have me there, I can only think of four (ignoring <hardware
failure>):-

I would think hardware failure is a good enough reason in and of itself, and
in fact that is the usual reason I thought watchdogs were for.

If your code PROM/EPROM/EEPROM/flash fails and the mcu starts executing
random memory as code, you want to make sure your motors, pumps, X-ray tube,
etc shuts down.


--
==========================================================
Chris Candreva  -- snipped-for-privacy@westnet.com -- (914) 967-7816
We've slightly trimmed the long signature. Click to see the full one.
Re: "Am I still working okay?" asked the micro controller...
On Fri, 28 May 2004 15:45:20 GMT, "Christopher X. Candreva"

Quoted text here. Click to load it
failure>):-
Quoted text here. Click to load it

If it appears that the hardware is falling apart, how could you trust
that it makes any sensible decisions ? Of course, if each output
individually fall into a fail safe state if not refreshed by the
processor, then it makes sense to halt the processor immediately, if
something suspicious happens. Trying to do something after a watchdog
reset usually just will worsen the situation, if the hardware is
suspect.
 
Quoted text here. Click to load it

In any really safety critical system, you should use double or triple
(voting) redundant system, not watchdogs.

Paul


Site Timeline