[dancer-users] All starman workers have one (deleted) file open in lsof, seem hung on an flock call

Mike South msouth at gmail.com
Tue Mar 18 14:15:49 GMT 2014


On Tue, Mar 18, 2014 at 5:20 AM, David Precious <davidp at preshweb.co.uk>wrote:

> On Tue, 18 Mar 2014 01:42:42 -0500
> Mike South <msouth at gmail.com> wrote:
> > I recently had a problem come up with my D1 app.  I have the app
> > writing the PID to the database at start and stop of a particular
> > route, and I have two runs that never completed.
> >
> > I did an strace on the worker and it looked like this:
> >
> > [root at host ~]# strace -p 26508
> > Process 26508 attached - interrupt to quit
> > flock(10, LOCK_EX
> >
> > lsof on that process gave me
> >
> > starman   26508      apps   10wW     REG               8,22         0
> >   24 /tmp/3j6mXZXwWM (deleted)
> >
> > (not sure, but I think the 10 in the flock call and the 10 in the
> > 10wW mean that we're talking about the same file?).
>
> yeah, that'll be the same file descriptor - and yeah, looks like it's
> waiting to get an exclusive lock on it, but presumably another process
> already has that lock and hasn't released it - possibly as it's waiting
> on a lock on something the process you looked at has locked, leading to
> a deadlock situation?
>
> Do you use File::Temp within your process to acquire temporary files?
>

Not directly.  I don't know of it specifically being used by anything, but
it is installed in my perlbrew so it definitely could be in use.  I could
put some debugging in there and see if it produces any clues.


>
>
> > After a little more digging I found that every starman worker had
> > that file open (except with 10w instead of 10wW for all but the one
> > above and stracing a few of the other workers showed the same output
> > ( "flock(10, LOCK_EX" ) as above.
>
> That same exact filename?  If so, odd.
>

yes:

[root at host ~]# lsof |grep /tmp/3j6mXZXwWM
starman   26502      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26503      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26504      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26505      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26506      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26507      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26508      apps   10wW     REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26509      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26510      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26511      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
starman   26512      apps   10w      REG               8,22         0
  24 /tmp/3j6mXZXwWM (deleted)
...

According to the docs, this output means all the processes have it open for
writing and 26508 has a write lock on the whole file.

I'll see if I can get anything out of File::Temp--maybe adding debugging
output there will give me an idea of what's so interesting to everybody.
 Thanks for looking at this with me.

mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.preshweb.co.uk/pipermail/dancer-users/attachments/20140318/8c1224c8/attachment.html>


More information about the dancer-users mailing list