[dancer-users] All starman workers have one (deleted) file open in lsof, seem hung on an flock call

David Precious davidp at preshweb.co.uk
Tue Mar 18 10:20:59 GMT 2014


On Tue, 18 Mar 2014 01:42:42 -0500
Mike South <msouth at gmail.com> wrote:
> I recently had a problem come up with my D1 app.  I have the app
> writing the PID to the database at start and stop of a particular
> route, and I have two runs that never completed.
> 
> I did an strace on the worker and it looked like this:
> 
> [root at host ~]# strace -p 26508
> Process 26508 attached - interrupt to quit
> flock(10, LOCK_EX
> 
> lsof on that process gave me
> 
> starman   26508      apps   10wW     REG               8,22         0
>   24 /tmp/3j6mXZXwWM (deleted)
> 
> (not sure, but I think the 10 in the flock call and the 10 in the
> 10wW mean that we're talking about the same file?).

yeah, that'll be the same file descriptor - and yeah, looks like it's
waiting to get an exclusive lock on it, but presumably another process
already has that lock and hasn't released it - possibly as it's waiting
on a lock on something the process you looked at has locked, leading to
a deadlock situation?

Do you use File::Temp within your process to acquire temporary files?


> After a little more digging I found that every starman worker had
> that file open (except with 10w instead of 10wW for all but the one
> above and stracing a few of the other workers showed the same output
> ( "flock(10, LOCK_EX" ) as above.

That same exact filename?  If so, odd.




More information about the dancer-users mailing list