ZFS & power-​failure: stable

At the week­end there was a power–fail­ure at our disaster-​recovery-​site. As everything should be con­nec­ted to the UPS, this should not have had an im­pact… un­for­tu­nately the guys re­spons­ible for the cabling seem to have not provided enough power con­nec­tions from the UPS. Res­ult: one of our stor­age sys­tems (all volumes in sev­er­al RAID5 vir­tu­al disks) for the test sys­tems lost power, 10 hard­disks switched in­to failed state when the power was stable again (I was told there where sev­er­al small power-​failures that day). Af­ter telling the soft­ware to have a look at the drives again, all phys­ic­al disks where ac­cep­ted.

All volumes on one of the vir­tu­al disks where dam­aged (ac­tu­ally, one of the vir­tu­al disks was dam­aged) bey­ond re­pair and we had to re­cov­er from backup.

All ZFS based moun­t­points on the good vir­tu­al disks did not show bad be­ha­vi­or (zfs clear + zfs scrub for those which showed check­sum er­rors to make us feel bet­ter). For the UFS based ones… some caused a pan­ic af­ter re­boot and we had to run fsck on them be­fore try­ing a second boot.

We spend a lot more time to get UFS back on­line, than get­ting ZFS back on­line. Af­ter this ex­per­i­ence it looks like our fu­ture Sol­ar­is 10u8 in­stalls will be with root on ZFS (our work­sta­tions are already like this, but our servers are still at Sol­ar­is 10u6).

Leave a Reply

Your email address will not be published. Required fields are marked *