I have been working on a problem with an installed product for approximately a year now. After much investigation including careful review of the code and repeat of ANSI hardware testing, we have been unable to recreate the problem in house. However, through analysis of the symptoms we have come to believe that something is causing the instruction pointer in this embedded application to be pointed to the wrong code address.
My question is what external events can affect a microprocessor in such a way that it essentially gets "lost" in execution? We are reasonably certain that an external event is the cause, rather than a stack problem, as the majority of the installed product are working fine and have been for over a year.
The microprocessor we are using does not have an illegal instruction trap or watchdog timer, so in order to fix the problem, we would likely need hardware modifications. I would like any information that any of you might have gleaned in past experience with similar issues so that we can pursue testing based on most likely causes.