Watchdog for an embedded Linux-based system

I have a Linux-based SOM that intregrates complexity (CPU, memories, Ethernet PHY and so on) and I'm going to design a carrier board for it.

I'd like to add an external watchdog on the carrier board for two purposes:

- automatically rerun the system when the applications running on Linux crash for some reasons (mainly bugs)

- automatically rerun the system when the system is not able to start-up

For the big issue is the second point. I sometime have seen the system hangs during startup (during bootloader, during kernel initialization and so on). I don't know the causes of these and the SOM manufacturer doesn't help too much. It says... it could happen on those complex systems based on a desktop OS as Linux. It says giving a pulse on main CPU RESET linux could be not sufficient in certain odd situations.

So I'm thinking to add an external watchdog that:

- monitors the Linux system activity (maybe a pin that goes high and low at a certain frequency by an application);

- open and close again a small relay that brings the main power supply voltage rail to the Linux system

Do you have better suggestions?

Reply to
pozz
Loading thread data ...

Sounds good to me. If you have the external watchdog as a separate MCU (with super simple firmware and an external HW watchdog of it's own), you can also ask the WD to cycle power on purpose.

Make sure you also take down all the incoming IO signals, they may keep some part of the SOM powered and locked up. I've seen 0.4V to an IO pin keep SRAM registers in a state that locked the system.

--
mikko
Reply to
Mikko OH2HVJ

Do you suggest any external HW watchdog?

Hmmm..., this could be a little more complex. Anyway, thank you for this suggestion.

Reply to
pozz

Sounds good. I'd look for a power supply with an enable input (like most PC power supplies these days) and use that instead of a relay.

AFAIK, when the system gets bodged to the point of needing a power cycle, it's because some peripheral or another gets bodged in a way that won't get un-bodged merely as a result of sweet-talking from the CPU. But, that's just a guess.

--
www.wescottdesign.com
Reply to
Tim Wescott

Depends on what you have for free GPIO pins to reset a WDT. A 555 and a SSR can be rigged to do what you want. If you have a serial port that supports full HW flow control or the like, you can use a CTS or other control pin. A little dirt cheap uC can do the same thing and can actually allow more control as you might want a longer WDT time during system boot vs up and running the app. The uC can listen on a serial port and when the app starts it's heartbeat the uC can adjust the timeout interval.

You can also do this with a simple shift register and appropiate clock. You clock in a high or low depending on what you need to pop out the other end for the power reset. The heart beat resets the shift register.

The tricky part is the intial power on sequence with the WDT and then the WDT power on to the main system. Also will you allow a normal reset of the system and how does the WDT react.

--
Chisolm 
Republic of Texas
Reply to
Joe Chisolm

Don't use a watchdog to fix a DESIGN PROBLEM. Find a new vendor. (How can you have confidence in your product if one of the main components can't even meet its minimal performance requirements?)

Reply to
Don Y

Hi Don,

this is how they do things nowadays, not much we can do about it. Spread a mess over the currently exposed mess to cover it. Like a friend of mine once said, our civilization won't be the first one to fall. Even in mass products - I had a phone from one of the leading manufacturers which sometimes would not come out of reset no matter what and for how long you press - took opening it and removing the battery for a while to get it to work. The messiness in our trade is past the point of no return, has been for a while.

Dimiter

Reply to
Dimiter_Popoff

Just don't end up in a Windows BSOD cycle and as I just saw at a local shop with their portable card reader.

Power up get fatal error Restart get fatal error Restart get fatal error Restart get fatal error .. repeat ad nauseum

Removing battery for 30 secs made no difference.

Standard Windows configuration on most versions on getting BSOD restart system to get BSOD and repeat.

If you do watchdog reset make sure the reset from the watchdog

Cannot get STUCK in reset output Only reboots 'n' times before HALTING

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
   Logic Gates  
Education  
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate
Reply to
Paul

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.