Sta­bil­i­ty prob­lems with 7‑stable

On the machine where I host this blog, I have/had some sta­bil­i­ty problems.

Last week I updat­ed the machine from FreeB­SD 7.1‑pX to 7.2‑p5 (GENERIC ker­nel in both cas­es). 5 – 10 Min­utes after the reboot into the new ver­sion the machine had a dead­lock. After some road­blocks (order­ing a KVM-switch from the hoster, the KVM-switch not work­ing with a proxy (dur­ing lunchtime at work), a bro­ken video-capture of the KVM-switch and a replace­ment on Mon­day morn­ing to not pay the WE-fees), I spend a big part of the night to get it sta­ble. I tried dis­abling SMP, enabling INVARIANTS and WITNESS, chang­ing the sched­uler, cut­ting the soft­ware mir­ror (to rule out a mis­match between the con­tent of the disks after all the hard reboots) and updat­ing to 7‑stable.

Unfor­tu­nate­ly noth­ing helped. 🙁

Googling a lit­tle bit around (it is a AMD Dual-Core sys­tem with NVidia MCP61 chipset) was lead­ing me to a post on the mail­inglists from 2008 which talks about an issue with the buffer cache. I do not know if this is still an issue (I have send a email to kib@ to ask about it), and my sce­nario is not the same as the one which is described in the mail, but because of this I decid­ed to switch one of the two UFS mir­rors to ZFS.

The first boot into the ZFS caused again a reboot after some min­utes (I do not know if it was because of a mem­o­ry exhaust­ed pan­ic, or because of a dead­lock), but as I did not tune the ker­nel for ZFS I am tempt­ed to believe that I should not count that. Now, after tun­ing the ker­nel (increas­ing the kmem_size to 700M, no prefetch­ing, lim­it­ing the ARC to 40M) it is up since near­ly 2h (as of this writ­ting… cross­ing fin­gers). Before it was not able to sur­vive more than some min­utes with just the jail for the mails up. Now I not only have the mail-jail up, but also the jail for the blog (one jail still dis­abled, but I will take care about that after this post).

I do not know if only increas­ing the kmem_size would have helped with the prob­lem, but as I was test­ing a GENERIC ker­nel + gmir­ror mod­ule in the begin­ning, I expect­ed that the auto-tuning of this val­ue should have been enough for such a sim­ple set­up (2GB RAM, 2 disks with 3 par­ti­tions each, one par­ti­tion pair for root, one for swap, one for the jails).

I hope that I sta­bi­lized the sys­tem now. It may be the case that I will test some patch­es in case some­one comes up with some­thing, so do not be sur­prised if the blog and email to me is a lit­tle bit flaky.