Sta­bil­ity prob­lems with 7-​stable

On the ma­chine where I host this blog, I have/​had some sta­bil­ity prob­lems.

Last week I up­dated the ma­chine from FreeBSD 7.1-pX to 7.2-p5 (GENERIC ker­nel in both cases). 5 – 10 Minutes af­ter the re­boot in­to the new ver­sion the ma­chine had a dead­lock. Af­ter some road­b­locks (or­der­ing a KVM-​switch from the hoster, the KVM-​switch not work­ing with a proxy (dur­ing lunch­time at work), a broken video-​capture of the KVM-​switch and a re­place­ment on Monday morn­ing to not pay the WE-​fees), I spend a big part of the night to get it stable. I tried dis­abling SMP, en­abling INVARIANTS and WITNESS, chan­ging the sched­uler, cut­ting the soft­ware mir­ror (to rule out a mis­match between the con­tent of the disks af­ter all the hard re­boots) and up­dat­ing to 7-​stable.

Un­for­tu­nately noth­ing helped. 🙁

Googling a little bit around (it is a AMD Dual-​Core sys­tem with NVidia MCP61 chip­set) was lead­ing me to a post on the mailing­lists from 2008 which talks about an is­sue with the buf­fer cache. I do not know if this is still an is­sue (I have send a email to kib@ to ask about it), and my scen­ario is not the same as the one which is de­scribed in the mail, but be­cause of this I de­cided to switch one of the two UFS mir­rors to ZFS.

The first boot in­to the ZFS caused again a re­boot af­ter some minutes (I do not know if it was be­cause of a memory ex­hausted pan­ic, or be­cause of a dead­lock), but as I did not tune the ker­nel for ZFS I am temp­ted to be­lieve that I should not count that. Now, af­ter tun­ing the ker­nel (in­creas­ing the kmem_​size to 700M, no prefetch­ing, lim­it­ing the ARC to 40M) it is up since nearly 2h (as of this writ­ting… cross­ing fin­gers). Be­fore it was not able to sur­vive more than some minutes with just the jail for the mails up. Now I not only have the mail-​jail up, but also the jail for the blog (one jail still dis­abled, but I will take care about that af­ter this post).

I do not know if only in­creas­ing the kmem_​size would have helped with the prob­lem, but as I was test­ing a GENERIC ker­nel + gmir­ror mod­ule in the be­gin­ning, I ex­pec­ted that the auto-​tuning of this value should have been enough for such a sim­ple setup (2GB RAM, 2 disks with 3 par­ti­tions each, one par­ti­tion pair for root, one for swap, one for the jails).

I hope that I sta­bil­ized the sys­tem now. It may be the case that I will test some patches in case someone comes up with some­thing, so do not be sur­prised if the blog and email to me is a little bit flaky.

