Mono build prob­lems on FreeBSD-​current

I try to build mono on FreeBSD-current (it is a dependency of some GNOME program). Unfortunately this does not work correctly.

What I see are hangs of the build. If I stop the build when it hangs and restart it, it will continue and succeed to process the build steps a little bit further, but then it hangs again.

If I ktrace the hanging process, I see that there is a call to wait returning with the error message that the child does not exist. Then there is a call to nanosleep.

It looks to me like this process missed some SIGCLD (or is waiting for something which did not exist at all), and a loop is waiting for a child to exit. This loop probably has no proper condition for the fact that there is no such child (anymore). As such it will stay forever in this loop.

So I grepped a litte bit around in mono and found the following code in <mono-src-dir>/mcs/class/Mono.Posix/Mono.Unix/UnixProcess.cs:

public void WaitForExit ()
{
    int status;
    int r;
    do {
        r = Native.Syscall.waitpid (pid, out status, (Native.WaitOptions) 0);
    } while (UnixMarshal.ShouldRetrySyscall (r));
    UnixMarshal.ThrowExceptionForLastErrorIf (r);
}

This does look a little bit as it could be related to the problem I see, but ShouldRetrySyscall only returns true if the errno is EINTR. So this looks correct. 🙁

I looked a little bit more at this file and it looks like either I do not understand the semantic of this language, or GetProcessStatus does return the returnvalue of the waitpid call instead of the status (which is not what it shall return to my understanding). If I am correct, it can not really detect the status of a process. It would be very bad if such a fundamental thing went unnoticed in mono...  which does not put a good light on the unit-tests (if any) or the general testing of mono. For this reason I hope I am wrong.

I did not stop there, as this part does not look like it is the problem. I found the following in mono/io-layer/processes.c:

static gboolean waitfor_pid (gpointer test, gpointer user_data)
{
...
    do {
        ret = waitpid (process->id, &status, WNOHANG);
    } while (errno == EINTR);

if (ret <= 0) { /* Process not ready for wait */ #ifdef DEBUG g_message ("%s: Process %d not ready for waiting for: %s", __func__, process->id, g_strerror (errno)); #endif

return (FALSE); }

#ifdef DEBUG g_message ("%s: Process %d finished", __func__, ret); #endif

process->waited = TRUE; ... }

And here we have the problem, I think. I changed the (ret <= 0) to  (ret == 0 || (ret < 0 && errno != ECHILD)). This will not really give the correct status, but at least it should not block anymore and I should be able to see the difference during the build.

And now after testing, I see a difference, but the problem is still there. The wait with ECHILD is gone in the loop, but there is still some loop with a semaphore operation:

62960 mono     CALL  clock_gettime(0xd,0xbf9feef8)
62960 mono     RET   clock_gettime 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  nanosleep(0xbf9fef84,0)
62960 mono     RET   nanosleep 0
62960 mono     CALL  clock_gettime(0xd,0xbf9feef8)
62960 mono     RET   clock_gettime 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  nanosleep(0xbf9fef84,0)

OK, there is more going on. I think someone with more knowledge about mono should have a look at this (do not only look at this semop thing, but also look why it loses a child).

3 thoughts on “Mono build prob­lems on FreeBSD-​current”

  1. Pingback: Tweets that mention Mono build prob­lems on FreeBSD-current | Alexander Leidinger -- Topsy.com
  2. Hum

    I am afraid you are right and the code in UnixProcess.cs is wrong… May I sug­gest you to open a bug for this in the Novell’s bug track­er?
    https://​bug​zil​la​.nov​ell​.com

    Re­gard­ing the prob­lem glob­ally:

    I also did loc­al ed­its in my svn check­out to com­pile mono with de­bug­ging sup­port but un­for­tu­nately race con­di­tions oc­curs really less of­ten then and de­bug­ging is just harder… So I have not pushed any of these patches in the FreeBSD port. I had al­most the same prob­lem that was triggered by run­ning mono-2.6 (not yet in the ports) and your patch to processes.c seems to solve it too. You can have a look at the bug re­port at nov­ell here:
    https://​bug​zil​la​.nov​ell​.com/​s​h​o​w​_​b​u​g​.​c​g​i​?​i​d​=​5​2​8​830

    More test­ing is needed but I think you put your fin­ger at the right loc­a­tion, I have been fooled by the way I dis­covered the prob­lem and though it was a re­gres­sion… May­be it’s not ac­tualy.

    Which ver­sion of mono are you run­ning ?

    May­be chat­ting about all this on mono@ is the be­st place?

    Thanks,
    Ro­main

  3. Pingback: Debugging lang/mono — 2nd round « The Daily BSD

Leave a Reply

Your email address will not be published. Required fields are marked *