how to figure out where a process hung

I've got a problem...

I have a process run from cron that hangs in an uninterruptible sleep.

It reads data from a webcam, and writes to a tmpfs partition.

This process runs every 15 minutes; and most of the time it will run just fine, but once in a while, it hangs. Then cron runs it again, and it hangs again. Pretty soon I have a bunch of hung processes that consume all resources, and for all practical purposes my little system is dead.

The frustrating thing is that it happens rarely; the process runs every 15 minutes and sometimes it will run for days just fine, and then it will start hanging.

Is there some way to find out where the process is hung after it is hung up?

The program is spcacat, a very simple snapshot util for webcams using the spca driver: . Anyone have any suggestions? I have 3 weeks to get this up and running, and that doesn't give me much time....

--Yan

--
  o__
  ,>/'_          o__
  (_)\(_)        ,>/'_        o__
Yan Seiner, PE  (_)\(_)       ,>/'_     o__
Certified Personal Trainer   (_)\(_)    ,>/'_        o__
Licensed Professional Engineer         (_)\(_)       ,>/'_
Who says engineers have to be pencil necked geeks?  (_)\(_)
Reply to
Captain Dondo
Loading thread data ...

What does this mean ? A sleep() call needs to specify a time, so it can't "hang".

Moreover, AFAIK, a user land process only can do uninterruptible sleep (a very short nanosleep() ), if it is assigned very special attributes.

Is it possible that the process waits for some hardware event that does not occur due to defective hardware ?

-Michael

Reply to
Michael Schnell

Try to use "strace". When it hangs connect to it with "strace -p " and you will see where it hangs (if it hangs in a system call). If this does not help, try with "ltrace" instead.

Hope it helps Juergen

Reply to
Juergen Beisert

Hello,

Your cronjob could kill all running instances before starting a new one. That way the ressources would stay free and the system won't get problems. It is not a clean solution, but at least it can keep your ressources free. You could log if any instances are killed (instead of a clean exit) and maybe find some event which causes the program to hang.

That's an idea, but it will only help against the ressource leak of hung processes, not against the problem.

I'd guess the program waits for something (a camera's event?), but never gets it.

Regards, Sebastian

Reply to
Sebastian

From 'man ps':

PROCESS STATE CODES Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process. D Uninterruptible sleep (usually IO)

These processes show up as 'D', which means they cannot be killed.

I am guessing that thse processes are waiting for some camera event that never occurs, but I have not figured out why only sometimes....

The camera shares the USB bus with a GPS, which is being polled almost continously. I suspect there is some bus contention which triggers this, but I have no idea where to start looking; all of the code I've looked at looks OK so far.

--Yan

--
  o__
  ,>/'_          o__
  (_)\(_)        ,>/'_        o__
Yan Seiner, PE  (_)\(_)       ,>/'_     o__
Certified Personal Trainer   (_)\(_)    ,>/'_        o__
Licensed Professional Engineer         (_)\(_)       ,>/'_
Who says engineers have to be pencil necked geeks?  (_)\(_)
Reply to
Captain Dondo

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.