The power supply of my server at home failed at the end of last month. As I was busy with renovation at home, it took me a while to check if it is really the PSU or something else. When I was sure about the failed piece, I have sent the PSU to the RMA address the Enermax support gave me (the PSU has a 5 year warranty, and I have it since one year). Due to holidays it took a while to get the repaired unit back, but I want to say thank you to the Enermax support:
- Thank you for hand written responses, I did not get obvious automatic responses or canned responses (well, maybe they did some copy&paste for the RMA address and such, but each mail had at least a part which was not coming from copy&paste).
- Thank you for getting back to me within a reasonable time.
- Thank you for politely answering all my support requests.
- Thank you for being honest in your communication (slow handling of the repair due to people being in holiday, not because of missing pieces from suppliers or other excuses outside Enermax).
This is how the support shall be, unfortunately this is not always the case, but at least here it was. Thank you!
After putting the disks of the 7-stable system which exhibited stability problems into a completely different system (it is a rented root-server, not our own hardware), the system now survived more than a day (and still no trace of problems) with the UFS setup. Previously it would crash after some minutes.
The ZFS setup with the changed hardware had a problem during the night before (like always after all my ZFS related changes on this machine), but on this machine I changed all locks in ZFS from shared locks to exclusive locks (this extended the uptime from 4 – 6 hours to “until I rebooted the morning after because of hanging processes”), so this may be because of this. I do not know yet if we will test the ZFS setup with the pure 7‑stable source we use now or not (the goal was to get back a stable system, instead of playing around with unrelated stuff).
It looks like some kind of hardware problem was uncovered by updating from 7.1 to 7.2 (and 7‑stable subsequently). This new machine has a completely different chipset, a new CPU and RAM and PSU and … so I do not really know what caused this (but the fact that the previous system did not recognize the CPU after replacing it with a bigger one and the observation that only shared locks with a specific usage pattern where affected lets me point towards missing microcode updates…).