Hi, I was hoping to get some suggestions on debugging a multi-threaded Linux program that crashes about every 10-12 hours. The program coordinates the behaviour between several (about 4)attached devices (serial and ethernet). There is generally one thread for each attached device. Unfortunately when it crashes the threads stop responding one-by-one, no seg fault or other obvious error occurs, making it very hard to pin down. What I suspect is happening is one thread is gradually overwriting memory and it crashes as soon as the memory being overwritten is in use by another thread. It currently has a 4k guard between threads.
Does anyone have any suggestions for how to figure out which code is the source of the problem? I've inspected the most likely areas but haven't been successful in fixing it. Any techniques using gdb/ddd, or other tools? If it generated a seg fault it would be easy......
Thanks in advance for any help, this is really driving me nuts!