how to figure out where a process hung

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
I've got a problem...

I have a process run from cron that hangs in an uninterruptible sleep.

It reads data from a webcam, and writes to a tmpfs partition.

This process runs every 15 minutes; and most of the time it will run just
fine, but once in a while, it hangs.  Then cron runs it again, and it
hangs again.  Pretty soon I have a bunch of hung processes that consume
all resources, and for all practical purposes my little system is dead.

The frustrating thing is that it happens rarely; the process runs every 15
minutes and sometimes it will run for days just fine, and then it will
start hanging.

Is there some way to find out where the process is hung after it is hung
up?

The program is spcacat, a very simple snapshot util for webcams using the
spca driver:  <http://mxhaard.free.fr/ .  Anyone have any suggestions?  I
have 3 weeks to get this up and running, and that doesn't give me much
time....

--Yan

--
  o__
  ,>/'_          o__
We've slightly trimmed the long signature. Click to see the full one.
Re: how to figure out where a process hung
Quoted text here. Click to load it

What does this mean ? A sleep() call needs to specify a time, so it
can't "hang".

Moreover, AFAIK, a user land process only can do uninterruptible sleep
(a very short nanosleep() ),  if it is assigned very special attributes.

Is it possible that the process waits for some hardware event that does
not occur due to defective hardware ?

-Michael

Re: how to figure out where a process hung

Quoted text here. Click to load it

From 'man ps':

PROCESS STATE CODES
       Here are the different values that the s, stat and state output
       specifiers (header "STAT" or "S") will display to describe the state of
       a process.
       D    Uninterruptible sleep (usually IO)

These processes show up as 'D', which means they cannot be killed.

I am guessing that thse processes are waiting for some camera event that
never occurs, but I have not figured out why only sometimes....

The camera shares the USB bus with a GPS, which is being polled almost
continously.  I suspect there is some bus contention which triggers this,
but I have no idea where to start looking; all of the code I've looked at
looks OK so far.

--Yan

--
  o__
  ,>/'_          o__
We've slightly trimmed the long signature. Click to see the full one.
Re: how to figure out where a process hung
Quoted text here. Click to load it

Try to use "strace". When it hangs connect to it with "strace -p <pid>" and
you will see where it hangs (if it hangs in a system call). If this does
not help, try with "ltrace" instead.

Hope it helps
Juergen

Re: how to figure out where a process hung
Hello,

Quoted text here. Click to load it

Your cronjob could kill all running instances before starting a new one.
That way the ressources would stay free and the system won't get problems.
It is not a clean solution, but at least it can keep your ressources free.
You could log if any instances are killed (instead of a clean exit) and
maybe find some event which causes the program to hang.

Quoted text here. Click to load it

That's an idea, but it will only help against the ressource leak of hung
processes, not against the problem.

I'd guess the program waits for something (a camera's event?), but never
gets it.

Regards,
Sebastian



Site Timeline