"Am I still working okay?" asked the micro controller...

- G
- Guillaume
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, May 29, 2004 12:15 AM

Besides: redundancy still isn't a good reason not to use watchdogs.

You may have 4 redundant devices, but what if they all fail at the same time (which could happen under extreme, unplanned condition)? What if only one of them fails, but there is another unexpected failure that prevents redundancy to function as expected (that is, you have

3 working devices, but the whole system fails to notice there is something wrong with the 4th)? Well, you get the idea.

If fighting planes were perfect, pilots were perfect and conditions were perfect, guaranteed 100% of the time, we wouldn't need to design ejecting seats. But we still design them, and once in a while, they are actually useful and save a life. That's exactly the same thing. Who cares whose fault it is when an unexpected event occurs? It's useful to be able to retrieve detailed info of failures, but right when it happens, nobody cares at this point: the system has to recover in the quickest way possible. Period.

As a basic rule of thumb, I'd just say that watchdogs are good for dealing with transient, temporary, unexpected failures. Redundancy is used more with a long-term (or complete) failure of one or several devices in mind. Of course, if designed in a sensible manner, they can complement one other and even interact with one another. That's when things get interesting.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, May 29, 2004 9:21 AM

The main purpose of redundant systems is to let the system operate normally even if some controllers fail, not safety. I fully agree that the last ditch security system should not rely on computer logic and preferably not even on electricity.

Paul

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, May 29, 2004 9:21 AM

But how does the WDT tell the difference between a transient failure and the hardware falling apart ?

The self test routines after reset may detect some permanent failure or it might not. The self test routine itself could go crazy due to permanent hardware problems and the WDT kicks in again.

Now we have an other interesting situation, which has not been discussed so far. If there is a permanent hardware/software error and the WDT triggers over and over again, this can also cause a lot of damage (e.g. due to repeated large startup currents in some big loads). Thus, the WDT should be allowed to kick in only for a predefined number of times and then disable the whole system until manual intervention.

Paul

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, May 29, 2004 10:27 AM

I have also noticed a trend for some newer WDOG devices to have quite long timeout options (mins to even hours). This can have merit, as examples given in another thread show the problems with designing too close to a WDOG's poorly defined timebase. Other WDOGs I've seen have a longer FIRST trigger window, to allow more elasticity on POST/Boot modes, until the opeational SW proper starts working.

It would be a good idea to check for annoyance/damage modes, in a continually firing WDOG failure instance.

-jg

- M
- Mel Wilson
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, May 29, 2004 1:19 PM

As colleague DW said, " ... idiot proof. It proves we're idiots." He was kidding, of course.

Regards. Mel.

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Jun 1, 2004 11:50 AM

"Guy Macon" skrev i meddelandet news:- snipped-for-privacy@speakeasy.net...

Remember reading the warrantly clause like this: "The only guarantee you'll get from us is that eventually all our equipment will fail."

--
Best Regards
Ulf at atmel dot com
These comments are intended to be my own opinion and they
may, or may not be shared by my employer, Atmel Sweden.

- E
- Eric Bohlman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jun 5, 2004 6:17 AM

The Artist Formerly Known as Kap'n Salty wrote in news: snipped-for-privacy@corp.supernews.com:

Amen to #4. I remember reading a story about a company that, when hiring salesmen, would always ask the prospective salesman about the major accounts that he had *lost*. If he had never lost a customer, he didn't get hired, because that meant that he had never "played in the major leagues."

Part of being a geek is having a tendency to grossly overestimate the role that personal ability plays in the success of one's work. The reality is that the highest levels of intelligence (or its correlates) that have been observed in human beings are *far, far* away from the levels that would guarentee perfection. Any business process that relies on humans being omniscient is, by definition, a failure. There is *no* way to guarantee that Mr. Murphy will never pay you a visit. There are practices that will make him feel distinctly unwelcome (and there are practices that amount to buying him a first-class plane ticket and putting him up in the penthouse suite of the most expensive hotel in town), but none of them will offer you absolute certainty.

- M
- Marc Le Roy
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sun, Jun 13, 2004 4:49 PM

No system can be 100% reliable. All that matters is to get the level of reliability required by the application. Generally, the voter mechanism is designed in order to be far more reliable than the other part of the system (reliability even better than the resulting reliability of the voting algorithm).

Marc