Alexander Leidinger

Just another weblog

Jan
28

Mono build prob­lems on FreeBSD-current

I try to build mono on FreeBSD-cur­rent (it is a depen­dency of some GNOME pro­gram). Unfor­tu­nately this does not work correctly.

What I see are hangs of the build. If I stop the build when it hangs and restart it, it will con­tinue and suc­ceed to process the build steps a lit­tle bit fur­ther, but then it hangs again.

If I ktrace the hang­ing process, I see that there is a call to wait return­ing with the error mes­sage that the child does not exist. Then there is a call to nanosleep.

It looks to me like this process missed some SIGCLD (or is wait­ing for some­thing which did not exist at all), and a loop is wait­ing for a child to exit. This loop prob­a­bly has no proper con­di­tion for the fact that there is no such child (any­more). As such it will stay for­ever in this loop.

So I grepped a litte bit around in mono and found the fol­low­ing code in <mono-src-dir>/mcs/class/Mono.Posix/Mono.Unix/UnixProcess.cs:

public void WaitForExit ()
{
    int status;
    int r;
    do {
        r = Native.Syscall.waitpid (pid, out status, (Native.WaitOptions) 0);
    } while (UnixMarshal.ShouldRetrySyscall (r));
    UnixMarshal.ThrowExceptionForLastErrorIf (r);
}

This does look a lit­tle bit as it could be related to the prob­lem I see, but Shoul­dRetrySyscall only returns true if the errno is EINTR. So this looks cor­rect. :-(

I looked a lit­tle bit more at this file and it looks like either I do not under­stand the seman­tic of this lan­guage, or Get­ProcessSta­tus does return the return­value of the wait­pid call instead of the sta­tus (which is not what it shall return to my under­stand­ing). If I am cor­rect, it can not really detect the sta­tus of a process. It would be very bad if such a fun­da­men­tal thing went unno­ticed in mono…  which does not put a good light on the unit-tests (if any) or the gen­eral test­ing of mono. For this rea­son I hope I am wrong.

I did not stop there, as this part does not look like it is the prob­lem. I found the fol­low­ing in mono/io-layer/processes.c:

static gboolean waitfor_pid (gpointer test, gpointer user_data)
{
...
    do {
        ret = waitpid (process->id, &status, WNOHANG);
    } while (errno == EINTR);

if (ret <= 0) { /* Process not ready for wait */ #ifdef DEBUG g_message ("%s: Process %d not ready for waiting for: %s", __func__, process->id, g_strerror (errno)); #endif

return (FALSE); }

#ifdef DEBUG g_message ("%s: Process %d finished", __func__, ret); #endif

process->waited = TRUE; ... }

And here we have the prob­lem, I think. I changed the (ret <= 0) to  (ret == 0 || (ret < 0 && errno != ECHILD)). This will not really give the cor­rect sta­tus, but at least it should not block any­more and I should be able to see the dif­fer­ence dur­ing the build.

And now after test­ing, I see a dif­fer­ence, but the prob­lem is still there. The wait with ECHILD is gone in the loop, but there is still some loop with a sem­a­phore operation:

62960 mono     CALL  clock_gettime(0xd,0xbf9feef8)
62960 mono     RET   clock_gettime 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  nanosleep(0xbf9fef84,0)
62960 mono     RET   nanosleep 0
62960 mono     CALL  clock_gettime(0xd,0xbf9feef8)
62960 mono     RET   clock_gettime 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  semop(0x20c0000,0xbf9feef6,0x1)
62960 mono     RET   semop 0
62960 mono     CALL  nanosleep(0xbf9fef84,0)

OK, there is more going on. I think some­one with more knowl­edge about mono should have a look at this (do not only look at this semop thing, but also look why it loses a child).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,

3 Responses to “Mono build prob­lems on FreeBSD-current”

  1. Tweets that mention Mono build prob­lems on FreeBSD-current | Alexander Leidinger -- Topsy.com Says:

    […] This post was men­tioned on Twit­ter by FreeBSD Project and air­jump, freeb­s­d­blogs. freeb­s­d­blogs said: Mono build prob­lems on FreeBSD-current: I try to build mono on FreeBSD – cur­rent (it is a depen­dency of some GNOME p… http://bit.ly/bUGBAZ […]

  2. Romain Tartière Says:

    Hum

    I am afraid you are right and the code in UnixProcess.cs is wrong… May I sug­gest you to open a bug for this in the Novell’s bug tracker?
    https://bugzilla.novell.com

    Regard­ing the prob­lem globally:

    I also did local edits in my svn check­out to com­pile mono with debug­ging sup­port but unfor­tu­nately race con­di­tions occurs really less often then and debug­ging is just harder… So I have not pushed any of these patches in the FreeBSD port. I had almost the same prob­lem that was trig­gered by run­ning mono-2.6 (not yet in the ports) and your patch to processes.c seems to solve it too. You can have a look at the bug report at nov­ell here:
    https://bugzilla.novell.com/show_bug.cgi?id=528830

    More test­ing is needed but I think you put your fin­ger at the right loca­tion, I have been fooled by the way I dis­cov­ered the prob­lem and though it was a regres­sion… Maybe it’s not actualy.

    Which ver­sion of mono are you running ?

    Maybe chat­ting about all this on mono@ is the best place?

    Thanks,
    Romain

    GD Star Rating
    loading...
    GD Star Rating
    loading...
  3. Debugging lang/mono — 2nd round « The Daily BSD Says:

    […] Mono build prob­lems on FreeBSD-current […]

Leave a Reply