Sta­bil­i­ty prob­lems solved (hard­ware problem)

After putting the disks of the 7‑stable sys­tem which exhib­it­ed sta­bil­i­ty prob­lems into a com­plete­ly dif­fer­ent sys­tem (it is a rent­ed root-server, not our own hard­ware), the sys­tem now sur­vived more than a day (and still no trace of prob­lems) with the UFS set­up. Pre­vi­ous­ly it would crash after some minutes.

The ZFS set­up with the changed hard­ware had a prob­lem dur­ing the night before (like always after all my ZFS relat­ed changes on this machine), but on this machine I changed all locks in ZFS from shared locks to exclu­sive locks (this extend­ed the uptime from 4 – 6 hours to “until I reboot­ed the morn­ing after because of hang­ing process­es”), so this may be because of this. I do not know yet if we will test the ZFS set­up with the pure 7‑stable source we use now or not (the goal was to get back a sta­ble sys­tem, instead of play­ing around with unre­lat­ed stuff).

It looks like some kind of hard­ware prob­lem was uncov­ered by updat­ing from 7.1 to 7.2 (and 7‑stable sub­se­quent­ly). This new machine has a com­plete­ly dif­fer­ent chipset, a new CPU and RAM and PSU and … so I do not real­ly know what caused this (but the fact that the pre­vi­ous sys­tem did not rec­og­nize the CPU after replac­ing it with a big­ger one and the obser­va­tion that only shared locks with a spe­cif­ic usage pat­tern where affect­ed lets me point towards miss­ing microc­ode updates…).

I merged a lot of ZFS patch­es to 7‑stable

Dur­ing the last weeks I iden­ti­fied 64 patch­es for ZFS which are in 8‑stable but not in 7‑stable. For 56 of them I had a deep­er look and most of them are com­mit­ed now to 7‑stable. The ones of those 56 which I did not com­mit are not applic­a­ble to 7‑stable (infra­struc­ture dif­fer­ences between 8 and 7).

Unfor­tu­nate­ly this did not solve the sta­bil­i­ty prob­lems I have on a 7‑stable system.

I also com­mit­ted a diff reduc­tion (between 8‑stable and 7‑stable) patch which also fixed some not so harm­less mis­merges (mem-leak and ini­tial­iz­ing the same mutex twice at dif­fer­ent places). No idea yet if it helps in my case.

I also want to merge the new arc reclaim log­ic from head to 8‑stable and 7‑stable. Maybe I can do this tomorrow.

Cur­rent­ly I run a test with a ker­nel where the shared locks for ZFS are switched to exclu­sive locks.

Sta­bi­liz­ing 7‑stable…

The 7‑stable sys­tem on which I have sta­bil­i­ty prob­lems after an update from 7.1 to 7.2/7‑stable is now semi-stable.

The watch­dog reboots after one minute of no reac­tion (cur­rent­ly it is able to run 3 – 4 hours), and the jails come up with­out prob­lems now.

The prob­lem with the jails was, that e.g. the mysql-server start­up went into the STOP state because TTY-input was “request­ed”. I solved the prob­lem by using /dev/null as input on jail-startup. On ‑cur­rent I do not see this behav­ior (I have a 9‑current sys­tem with a lot of jails which reboots every X days, and there mysql does not go into the STOP state).

I also start the jails in the back­ground, so that one block­ing jail does not block every­thing (done like in ‑cur­rent).

To say this with code:

--- /usr/src/etc/rc.d/jail      2009-02-07 15:04:35.000000000 +0100
+++ /etc/rc.d/jail      2009-12-16 17:03:12.000000000 +0100
@@ -556,7 +556,8 @@
 fi
 _tmp_jail=${_tmp_dir}/jail.$$
 eval ${_setfib} jail ${_flags} -i ${_rootdir} ${_hostname} \
-                       \\"${_addrl}\\" ${_exec_start} > ${_tmp_jail} 2>&1
+                       \\"${_addrl}\\" ${_exec_start} > ${_tmp_jail} 2>&1 \\
+                       </dev/null

 if [ "$?" -eq 0 ] ; then
 _jail_id=$(head -1 ${_tmp_jail})
@@ -623,4 +624,4 @@
 if [ -n "$*" ]; then
 jail_list="$*"
 fi
-run_rc_command "${cmd}"
+run_rc_command "${cmd}" &

I also iden­ti­fied 57 patch­es for ZFS which are in 8‑stable, but not in 7‑stable (I do not think they could solve the dead­lock, but I do not real­ly know, and now that there is one FS on ZFS, I would like to get as much fixed as pos­si­ble). Some of them should be merged, some would be nice to merge, and some I do not care much about (but if they are easy to merge, why not…). I already have all revi­sions and the cor­re­spond­ing com­mit logs avail­able in an email-draft.

Now I just need to write a lit­tle bit of text and find some peo­ple will­ing to help (some of the changes need a review if they are applic­a­ble to 7‑stable, and every­thing should be test­ed on a scratch-box).