Today I had again some energy to look at why mono fails to build on FreeBSD-current.
I decided to do a debug-build of mono. This did not work initially, I had to produce some patches.
Does this mean nobody is doing debug builds of mono on FreeBSD?
I have to say, this experience with lang/mono is completely unsatisfying.
Ok, bottom line, either the debug build seems to prevent a race condition in most cases (I had a lot less lockups for each of the two builds I did).
Whatever it is, I do not care ATM (if the configure stuff is looking at the architecture of the system, it may be the case that the i386-portbld-freebsdX does not enable some important stuff which would be enabled when run with i486-portbld-freebsdX or better). Here are the patches I used in case someone is interested (warning, copy&paste converted tabs to spaces, you also have to apply the map.c (a generated file… maybe a touch of the right file would allow to apply this patch in the normal patch stage) related stuff when the build fails, else there is some parser error in mono):
--- mcs/class/Mono.Posix/Mono.Unix/UnixProcess.cs.orig 2010-01-29 11:34:00.592323482 +0100
+++ mcs/class/Mono.Posix/Mono.Unix/UnixProcess.cs 2010-01-29 11:34:18.540607357 +0100
@@ -57,7 +57,7 @@ namespace Mono.Unix {
int r = Native.Syscall.waitpid (pid, out status,
Native.WaitOptions.WNOHANG | Native.WaitOptions.WUNTRACED);
UnixMarshal.ThrowExceptionForLastErrorIf (r);
- return r;
+ return status;
}
public int ExitCode {
--- mono/io-layer/processes.c.orig 2010-01-29 11:36:08.904331535 +0100
+++ mono/io-layer/processes.c 2010-01-29 11:42:21.819159544 +0100
@@ -160,7 +160,7 @@ static gboolean waitfor_pid (gpointer te
ret = waitpid (process->id, &status, WNOHANG);
} while (errno == EINTR);
- if (ret <= 0) {
+ if (ret == 0 || (ret < 0 && errno != ECHILD)) {
/* Process not ready for wait */
#ifdef DEBUG
g_message ("%s: Process %d not ready for waiting for: %s",
@@ -169,6 +169,17 @@ static gboolean waitfor_pid (gpointer te
return (FALSE);
}
+
+ if (ret < 0 && errno == ECHILD) {
+#ifdef DEBUG
+ g_message ("%s: Process %d does not exist (anymore)", __func__,
+ process->id);
+#endif
+ /* Faking the return status. I do not know if it is correct
+ * to assume a successful exit.
+ */
+ status = 0;
+ }
#ifdef DEBUG
g_message ("%s: Process %d finished", __func__, ret);
--- mono/metadata/mempool.c.orig 2010-01-29 11:58:16.871052861 +0100
+++ mono/metadata/mempool.c 2010-01-29 12:30:45.143367454 +0100
@@ -212,12 +212,14 @@ mono_backtrace (int size)
EnterCriticalSection (&mempool_tracing_lock);
g_print ("Allocating %d bytesn", size);
+#if defined(HAVE_BACKTRACE_SYMBOLS)
symbols = backtrace (array, BACKTRACE_DEPTH);
names = backtrace_symbols (array, symbols);
for (i = 1; i < symbols; ++i) {
g_print ("t%sn", names [i]);
}
free (names);
+#endif
LeaveCriticalSection (&mempool_tracing_lock);
}
--- mono/metadata/metadata.c.orig 2010-01-29 11:59:38.552316989 +0100
+++ mono/metadata/metadata.c 2010-01-29 12:00:43.957337476 +0100
@@ -3673,12 +3673,16 @@ mono_backtrace (int limit)
void *array[limit];
char **names;
int i;
+#if defined(HAVE_BACKTRACE_SYMBOLS)
backtrace (array, limit);
names = backtrace_symbols (array, limit);
for (i =0; i < limit; ++i) {
g_print ("t%sn", names [i]);
}
g_free (names);
+#else
+ g_print ("No backtrace available.n");
+#endif
}
#endif
--- support/map.c.orig 2010-01-29 12:05:22.374653708 +0100
+++ support/map.c 2010-01-29 12:10:29.024412452 +0100
@@ -216,7 +216,7 @@
#define _cnm_dump(to_t, from) do {} while (0)
#endif /* def _CNM_DUMP */
-#ifdef DEBUG
+#if defined(DEBUG) && !defined(__FreeBSD__)
#define _cnm_return_val_if_overflow(to_t,from,val) G_STMT_START {
int uns = _cnm_integral_type_is_unsigned (to_t);
gint64 min = (gint64) _cnm_integral_type_min (to_t);
GD Star Rating
loading…
GD Star Rating
loading…
Tags: amp,
atm,
bottom line,
debugging,
lt,
parser error,
patches,
pid,
posix,
public int —
I try to build mono on FreeBSD-current (it is a dependency of some GNOME program). Unfortunately this does not work correctly.
What I see are hangs of the build. If I stop the build when it hangs and restart it, it will continue and succeed to process the build steps a little bit further, but then it hangs again.
If I ktrace the hanging process, I see that there is a call to wait returning with the error message that the child does not exist. Then there is a call to nanosleep.
It looks to me like this process missed some SIGCLD (or is waiting for something which did not exist at all), and a loop is waiting for a child to exit. This loop probably has no proper condition for the fact that there is no such child (anymore). As such it will stay forever in this loop.
So I grepped a litte bit around in mono and found the following code in <mono-src-dir>/mcs/class/Mono.Posix/Mono.Unix/UnixProcess.cs:
public void WaitForExit ()
{
int status;
int r;
do {
r = Native.Syscall.waitpid (pid, out status, (Native.WaitOptions) 0);
} while (UnixMarshal.ShouldRetrySyscall (r));
UnixMarshal.ThrowExceptionForLastErrorIf (r);
}
This does look a little bit as it could be related to the problem I see, but ShouldRetrySyscall only returns true if the errno is EINTR. So this looks correct.
I looked a little bit more at this file and it looks like either I do not understand the semantic of this language, or GetProcessStatus does return the returnvalue of the waitpid call instead of the status (which is not what it shall return to my understanding). If I am correct, it can not really detect the status of a process. It would be very bad if such a fundamental thing went unnoticed in mono… which does not put a good light on the unit-tests (if any) or the general testing of mono. For this reason I hope I am wrong.
I did not stop there, as this part does not look like it is the problem. I found the following in mono/io-layer/processes.c:
static gboolean waitfor_pid (gpointer test, gpointer user_data)
{
...
do {
ret = waitpid (process->id, &status, WNOHANG);
} while (errno == EINTR);
if (ret <= 0) {
/* Process not ready for wait */
#ifdef DEBUG
g_message ("%s: Process %d not ready for waiting for: %s",
__func__, process->id, g_strerror (errno));
#endif
return (FALSE);
}
#ifdef DEBUG
g_message ("%s: Process %d finished", __func__, ret);
#endif
process->waited = TRUE;
...
}
And here we have the problem, I think. I changed the (ret <= 0) to (ret == 0 || (ret < 0 && errno != ECHILD)). This will not really give the correct status, but at least it should not block anymore and I should be able to see the difference during the build.
And now after testing, I see a difference, but the problem is still there. The wait with ECHILD is gone in the loop, but there is still some loop with a semaphore operation:
62960 mono CALL clock_gettime(0xd,0xbf9feef8)
62960 mono RET clock_gettime 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL nanosleep(0xbf9fef84,0)
62960 mono RET nanosleep 0
62960 mono CALL clock_gettime(0xd,0xbf9feef8)
62960 mono RET clock_gettime 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL semop(0x20c0000,0xbf9feef6,0x1)
62960 mono RET semop 0
62960 mono CALL nanosleep(0xbf9fef84,0)
OK, there is more going on. I think someone with more knowledge about mono should have a look at this (do not only look at this semop thing, but also look why it loses a child).
GD Star Rating
loading…
GD Star Rating
loading…
Tags: error message,
fundamental thing,
gnome program,
little bit,
pid,
posix,
public void,
returnvalue,
syscall,
unit tests —
At work we have to use a proxy which requires authorization. With previous versions (firefox 3.0.x and 3.5.y for each valid x and y) I had the problem that each tab requested to enter the master password when starting firefox, to be able to fill in the proxy-auth data (shortcut: fill in only the first request, and for all others just hit return/OK). So for each tab I had to do something for the master-password, and after that for each tab I also had to confirm the proxy-auth stuff.
Very annoying! Oh, I should maybe mention that as of this writing I have 31 tabs open. Sometimes there are more, sometimes there are less.
Now with firefox 3.6 this is not the case anymore. Yeah! Great! Finally only one time the master password stuff, and then one time the proxy-auth stuff, and then all tabs proceed.
It took a long time since my first report about this, but now it is finally there. This is the best improvement in 3.6 for me.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: firefox,
long time,
master password,
previous versions,
proxy,
stuff,
tabs —
I experimented a little bit with the order of directories to backup in tarsnap.
Currently I use the following sorting algorithm:
- least frequently changed directory-trees first
Every change — even in meta-data — will affect the following data, as tarsnap is doing the de-duplication in fixed-width blocks (AFAIR 64k).
- for those directory-trees which change with the same frequency: list the bigger ones first
Implicitly I assume that the smaller ones are much smaller than the bigger ones so that the smaller part which will be backed up will not be noticed because of the bigger change. For my use cases of tarsnap this is true.
- if changes in a directory-tree are much much bigger than anything else, but the directory-tree has a medium change-frequency, put it even before less-frequently changing stuff
I do not want that a small change triggers a big backup, but a big backup can contain the remaining small part.
- if you backup home directories (even root’s one) and they do not contain much data, put them before directory-trees which change a lot daily
I do not want that a login triggers the transfer of data in other directory-trees which have not changed.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: change frequency,
directory tree,
directory trees,
frequency list,
home directories,
little bit,
meta data,
sorting algorithm —
After putting the disks of the 7–stable system which exhibited stability problems into a completely different system (it is a rented root-server, not our own hardware), the system now survived more than a day (and still no trace of problems) with the UFS setup. Previously it would crash after some minutes.
The ZFS setup with the changed hardware had a problem during the night before (like always after all my ZFS related changes on this machine), but on this machine I changed all locks in ZFS from shared locks to exclusive locks (this extended the uptime from 4 – 6 hours to “until I rebooted the morning after because of hanging processes”), so this may be because of this. I do not know yet if we will test the ZFS setup with the pure 7-stable source we use now or not (the goal was to get back a stable system, instead of playing around with unrelated stuff).
It looks like some kind of hardware problem was uncovered by updating from 7.1 to 7.2 (and 7-stable subsequently). This new machine has a completely different chipset, a new CPU and RAM and PSU and … so I do not really know what caused this (but the fact that the previous system did not recognize the CPU after replacing it with a bigger one and the observation that only shared locks with a specific usage pattern where affected lets me point towards missing microcode updates…).
GD Star Rating
loading…
GD Star Rating
loading…
Tags: hardware problem,
locks,
microcode updates,
observation,
psu,
root server,
stability problems,
stable source,
stable system,
uptime —