On Tue, 18 Mar 2014 01:42:42 -0500 Mike South <msouth@gmail.com> wrote:
I recently had a problem come up with my D1 app. I have the app writing the PID to the database at start and stop of a particular route, and I have two runs that never completed.
I did an strace on the worker and it looked like this:
[root@host ~]# strace -p 26508 Process 26508 attached - interrupt to quit flock(10, LOCK_EX
lsof on that process gave me
starman 26508 apps 10wW REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)
(not sure, but I think the 10 in the flock call and the 10 in the 10wW mean that we're talking about the same file?).
yeah, that'll be the same file descriptor - and yeah, looks like it's waiting to get an exclusive lock on it, but presumably another process already has that lock and hasn't released it - possibly as it's waiting on a lock on something the process you looked at has locked, leading to a deadlock situation? Do you use File::Temp within your process to acquire temporary files?
After a little more digging I found that every starman worker had that file open (except with 10w instead of 10wW for all but the one above and stracing a few of the other workers showed the same output ( "flock(10, LOCK_EX" ) as above.
That same exact filename? If so, odd.