After putting the disks of the 7–stable system which exhibited stability problems into a completely different system (it is a rented root-server, not our own hardware), the system now survived more than a day (and still no trace of problems) with the UFS setup. Previously it would crash after some minutes.
The ZFS setup with the changed hardware had a problem during the night before (like always after all my ZFS related changes on this machine), but on this machine I changed all locks in ZFS from shared locks to exclusive locks (this extended the uptime from 4 – 6 hours to “until I rebooted the morning after because of hanging processes”), so this may be because of this. I do not know yet if we will test the ZFS setup with the pure 7-stable source we use now or not (the goal was to get back a stable system, instead of playing around with unrelated stuff).
It looks like some kind of hardware problem was uncovered by updating from 7.1 to 7.2 (and 7-stable subsequently). This new machine has a completely different chipset, a new CPU and RAM and PSU and … so I do not really know what caused this (but the fact that the previous system did not recognize the CPU after replacing it with a bigger one and the observation that only shared locks with a specific usage pattern where affected lets me point towards missing microcode updates…).
Tags: hardware problem, locks, microcode updates, observation, psu, root server, stability problems, stable source, stable system, uptime —