<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Mar 18, 2014 at 5:20 AM, David Precious <span dir="ltr"><<a href="mailto:davidp@preshweb.co.uk" target="_blank">davidp@preshweb.co.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="">On Tue, 18 Mar 2014 01:42:42 -0500<br>
Mike South <<a href="mailto:msouth@gmail.com">msouth@gmail.com</a>> wrote:<br>
> I recently had a problem come up with my D1 app. I have the app<br>
> writing the PID to the database at start and stop of a particular<br>
> route, and I have two runs that never completed.<br>
><br>
> I did an strace on the worker and it looked like this:<br>
><br>
> [root@host ~]# strace -p 26508<br>
> Process 26508 attached - interrupt to quit<br>
> flock(10, LOCK_EX<br>
><br>
> lsof on that process gave me<br>
><br>
> starman 26508 apps 10wW REG 8,22 0<br>
> 24 /tmp/3j6mXZXwWM (deleted)<br>
><br>
> (not sure, but I think the 10 in the flock call and the 10 in the<br>
> 10wW mean that we're talking about the same file?).<br>
<br>
</div>yeah, that'll be the same file descriptor - and yeah, looks like it's<br>
waiting to get an exclusive lock on it, but presumably another process<br>
already has that lock and hasn't released it - possibly as it's waiting<br>
on a lock on something the process you looked at has locked, leading to<br>
a deadlock situation?<br>
<br>
Do you use File::Temp within your process to acquire temporary files?<br></blockquote><div><br></div><div>Not directly. I don't know of it specifically being used by anything, but it is installed in my perlbrew so it definitely could be in use. I could put some debugging in there and see if it produces any clues.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class=""><br>
<br>
> After a little more digging I found that every starman worker had<br>
> that file open (except with 10w instead of 10wW for all but the one<br>
> above and stracing a few of the other workers showed the same output<br>
> ( "flock(10, LOCK_EX" ) as above.<br>
<br>
</div>That same exact filename? If so, odd.<br></blockquote><div><br></div><div>yes:</div><div><br></div><div>[root@host ~]# lsof |grep /tmp/3j6mXZXwWM</div><div>starman 26502 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div>
<div>starman 26503 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div><div>starman 26504 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div>
<div>starman 26505 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div><div>starman 26506 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div>
<div>starman 26507 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div><div>starman 26508 apps 10wW REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div>
<div>starman 26509 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div><div>starman 26510 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div>
<div>starman 26511 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted)</div><div>starman 26512 apps 10w REG 8,22 0 24 /tmp/3j6mXZXwWM (deleted) </div>
<div>...</div><div><br></div><div>According to the docs, this output means all the processes have it open for writing and 26508 has a write lock on the whole file.</div><div><br></div><div>I'll see if I can get anything out of File::Temp--maybe adding debugging output there will give me an idea of what's so interesting to everybody. Thanks for looking at this with me.</div>
<div><br></div><div>mike</div><div><br></div></div></div></div>