Debugging 'hanging' userspace app

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Hello all,

I hope somebody can give me some tips on how to attack the following
problem: I have developed a userspace application running on a 2.4.31
mips embedded system. This app consequently runs for a a few days and
then 'stops' after 5 days, 17 hours and a few minutes. By 'stops', I
mean that it is still running, but does no longer do it's job (does not
answer on sockets, for example), and that strace shows no system call
activity :

  # strace -p 81
  Process 81 attached - interrupt to quit
  (nothing is printed after this line)

/proc/<pid>state, however, reports the process is in sleeping state

  # cat /proc/81/stat
  81 (main) S 1 10 10 0 -1 256 541 (...)

which is what I would expect, since the cpu load is near 0.00.

I included sysreq support in the kernel, and a sysrq-t gives the
follolowing output for this process. (I added symbol names myself)

  Call Trace:  
    [<80004764 _sys_rt_sigsuspend>]
    [<80017f1c sys_gettimeofday>]
    [<800095c0 stack_done>]
    [<80006668 handle_cpu_int>]

This callstack confuses me: is the process stuck while performing a
gettimeofday ?

So, my question is: how can I find out what this process is doing at
this moment ? The obvious answer would be to attach gdb, but for some
reason beyond my understanding it seems that gdb or some part of the
toolchain is broken in some way, since gdb only reports a corrupted
stack and is not able to provide a call trace. If nothing else works I
shall try to upgrade my toolchain, but I'd like to investigate more
simple options first.

Please let me know if I need to provide any more information.

Thank you very much for any tips.

  mips-linux-gcc (GCC) 3.3.6
  GNU ld version 20040114
  Linux (none) 2.4.31-INCAIP-4.3 #2 Mon Mar 19 11:13:58 CET 2007 mips unknown

Site Timeline