I try to build mono on FreeBSD-current (it is a dependency of some GNOME program). Unfortunately this does not work correctly.
What I see are hangs of the build. If I stop the build when it hangs and restart it, it will continue and succeed to process the build steps a little bit further, but then it hangs again.
If I ktrace the hanging process, I see that there is a call to wait [1] returning with the error message that the child does not exist. Then there is a call to nanosleep [2].
It looks to me like this process missed some SIGCLD (or is waiting for something which did not exist at all), and a loop is waiting for a child to exit. This loop probably has no proper condition for the fact that there is no such child (anymore). As such it will stay forever in this loop.
So I grepped a litte bit around in mono and found the following code in <mono-src-dir>/mcs/class/Mono.Posix/Mono.Unix/UnixProcess.cs:
public void WaitForExit () { int status; int r; do { r = Native.Syscall.waitpid (pid, out status, (Native.WaitOptions) 0); } while (UnixMarshal.ShouldRetrySyscall (r)); UnixMarshal.ThrowExceptionForLastErrorIf (r); }
This does look a little bit as it could be related to the problem I see, but ShouldRetrySyscall only returns true if the errno is EINTR. So this looks correct. 🙁
I looked a little bit more at this file and it looks like either I do not understand the semantic of this language, or GetProcessStatus does return the returnvalue of the waitpid call instead of the status (which is not what it shall return to my understanding). If I am correct, it can not really detect the status of a process. It would be very bad if such a fundamental thing went unnoticed in mono… which does not put a good light on the unit-tests (if any) or the general testing of mono. For this reason I hope I am wrong.
I did not stop there, as this part does not look like it is the problem. I found the following in mono/io-layer/processes.c:
static gboolean waitfor_pid (gpointer test, gpointer user_data) { ... do { ret = waitpid (process->id, &status, WNOHANG); } while (errno == EINTR);if (ret <= 0) { /* Process not ready for wait */ #ifdef DEBUG g_message ("%s: Process %d not ready for waiting for: %s", __func__, process->id, g_strerror (errno)); #endif
return (FALSE); }
#ifdef DEBUG g_message ("%s: Process %d finished", __func__, ret); #endif
process->waited = TRUE; ... }
And here we have the problem, I think. I changed the (ret <= 0) to (ret == 0 || (ret < 0 && errno != ECHILD)). This will not really give the correct status, but at least it should not block anymore and I should be able to see the difference during the build.
And now after testing, I see a difference, but the problem is still there. The wait with ECHILD is gone in the loop, but there is still some loop with a semaphore operation:
62960 mono CALL clock_gettime(0xd,0xbf9feef8)
62960 mono RET clock_gettime 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL nanosleep(0xbf9fef84,0)
62960 mono RET nanosleep 0
62960 mono CALL clock_gettime(0xd,0xbf9feef8)
62960 mono RET clock_gettime 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL nanosleep(0xbf9fef84,0)
OK, there is more going on. I think someone with more knowledge about mono should have a look at this (do not only look at this semop thing, but also look why it loses a child).
3 Comments To "Mono build problems on FreeBSD-current"
#1 Pingback By Tweets that mention Mono build problems on FreeBSD-current | Alexander Leidinger – Topsy.com On January 30, 2010 @ 04:00
[…] This post was mentioned on Twitter by FreeBSD Project and airjump, freebsdblogs. freebsdblogs said: Mono build problems on FreeBSD-current: I try to build mono on FreeBSD – current (it is a dependency of some GNOME p… [49] […]
#2 Comment By Romain Tartière On February 5, 2010 @ 21:29
Hum
I am afraid you are right and the code in UnixProcess.cs is wrong… May I suggest you to open a bug for this in the Novell’s bug tracker?
[50]
Regarding the problem globally:
I also did local edits in my svn checkout to compile mono with debugging support but unfortunately race conditions occurs really less often then and debugging is just harder… So I have not pushed any of these patches in the FreeBSD port. I had almost the same problem that was triggered by running mono‑2.6 (not yet in the ports) and your patch to processes.c seems to solve it too. You can have a look at the bug report at novell here:
[51]
More testing is needed but I think you put your finger at the right location, I have been fooled by the way I discovered the problem and though it was a regression… Maybe it’s not actualy.
Which version of mono are you running ?
Maybe chatting about all this on mono@ is the best place?
Thanks,
Romain
#3 Pingback By Debugging lang/mono — 2nd round « The Daily BSD On February 6, 2010 @ 01:01
[…] Mono build problems on FreeBSD-current […]