Watchdog Timers for FPGA Designs

A recent thread in comp.arch.embedded concerns using watchdog timers. I made the point that they are used with software designs because of the many possible ways they can screw up while hardware designs tend to be less prone to failures that would require the use of a watchdog timer to restore operation. This of course does not include designs that are subject to single event upset (SEU) such as space flight.

Opinions? Anyone here use watchdogs and care to share examples?

Anyone seen an ASIC that used a watchdog to get it out of a stuck state? That would include a CPU monitoring behavior and giving the ASIC a swift kick in the reset.

One point I was challenged on was that every FSM has potential for locking up and if it can't be designed to preclude that an internal watchdog would reset the FSM. I don't agree that this is "always" needed, but if the protocol specifies a timeout, then this is part of the protocol and not a "watchdog" in a true sense looking for aberrant behavior.

--

Rick C
Reply to
rickman
Loading thread data ...

I make a line of motion control interfaces. all of them have a problem that if the CPU stops talking to them, they'd just keep commanding motion at the same speed. One uses analog velocity outputs, so even if the local crystal oscillator stopped, it would keep that analog voltage going to the servo amp. If the oscillator stopped, then any kind of digital watchdog would never trip. So, I used an external one-shot and "non-clocked" logic to trip the E-stop FF to go into E-stop. That clears the DAC registers to zero Volts and shuts down the digital outputs that may enable servo amps, spindle motors, etc.

When I say "non-clocked" above, I am referring to SR FFs and such latch-like constructs, so they are expected to work without a system clock.

Jon

Reply to
Jon Elson

If I understand, your watchdog is not specific to the FPGA, but is a system watchdog in case of failure anywhere in the system, right? Have you seen the oscillator stop on a board? I don't have much experience with clocks faulting in the field, but obviously it is important to protect against any failure.

The other conversation I had in c.a.e is making me wonder if anyone has enough mistrust of their FPGA to add a watchdog in case a design fault causes a problem. SEU is very uncommon unless you are in a high radiation environment. There are always power supply glitches which can upset an FPGA, especially the RAM based types. But do designers worry about design flaws in HDL? Any examples of design mistakes you wish to protect from that a watchdog is useful?

--

Rick C
Reply to
rickman

I was once called in to fault-find a system that turned 18" pipes into very interesting and pretty shapes. Every now and then the controller would become catatonic and a /large/ lump of metal would zoom off at ~1m/s - until the Big Red Switch was kicked.

The source turned out to be an infrequent hardware static-1 hazard glitch. My recommendations were to fix the specific fault and also to implement a hardware watchdog timer.

Yes, that was a design fault, and could have been trapped during a design review - but it wasn't. Where all available people are operating in new areas, there is a significant chance that design faults will slip through despite everybody's best intentions.

Hence, in practice, it is unduly optimistic to say watchdog timers aren't necessary because design rules are /sufficient/ to prevent design errors.

And we should, of course, always consider that equipment may not be installed correctly, and/or an installation can degrade over time.

Watchdog timers can be a useful last line of defence against such events.

Reply to
Tom Gardner

Well, to protect against as many as can be done, practically. A guy I knew a long time ago was given the task of finding ALL single points of failure in a large central office telephone switch. They had two lockstep parallel CPUs and a hang detector. There was a complicated network to provide a glitch-free changeover if the master clock source failed. No matter how hard the design engineer tried, my friend always found there was STILL a single point of failure that would leave the system with no clock. There were a bunch of other single points of failure in the system.

So, I'm just trying to cover as many cases as I can with simple logic. The watchdog was much more aimed at computer or communication failure than a clock stoppage, but this logic should handle both cases.

In my case, all I want is a reliable halt to all motion, and don't care about automatic recovery.

Yes, I did a little motion control thing that was supposed to shuttle a rack of samples back and forth, and it was going to be in an area where people access would be restricted, so we wanted it to try to muddle through even when something went wrong. I added a few lines of VHDL here and there to try to trap abnormal cases that "should never happen" and go to the main reset condition and keep running. These are standard practices, such as a binary-coded state machine, where all unused states explicitly go to a defined state.

Jon

Reply to
Jon Elson

Interesting. I learned that synchronous logic was used to provide outputs that needed to not change state with changing inputs that were stable within the setup time of the circuit. It can be hard to produce combinatorial logic without static hazards, so they should not be attempted without formal analysis of the final circuit.

So how did the watchdog work?

--

Rick C
Reply to
rickman

The company didn't want to pay me to design it for them. I have no idea what they did/didn't do after my three days were up!

Reply to
Tom Gardner

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.