Cheap process mon­i­tor­ing (no addi­tion­al soft­ware required)

I have an old sys­tem (only the hard­ware, it runs ‑cur­rent) which reboots itself from time to time (most­ly dur­ing the dai­ly periodic(8) run, but also dur­ing a lot of com­pil­ing (por­tup­grade)). There is no obvi­ous rea­son (no pan­ic) why it is doing this. It could be that there is some hard­ware defect, or some­thing else. It is not impor­tant enough to get a high enough pri­or­i­ty that I try hard to ana­lyze the prob­lem with this machine. The annoy­ing part is, that some­times after a restart apache does not start. So if this hap­pens, the solu­tion is to login and start the web­serv­er. If the web­serv­er would start each time, near­ly nobody would detect the reboot (root gets an EMail on each reboot via an @reboot crontab entry).

My prag­mat­ic solu­tion (for ser­vices start­ed via a good rc.d script which has a work­ing sta­tus com­mand) is a crontab entry which checks peri­od­i­cal­ly if it is run­ning and which restarts the ser­vice if not. As an exam­ple for apache and an inter­val of 10 minutes:

*/10 * * * *    /usr/local/etc/rc.d/apache22 status >/dev/null 2>&1 || /usr/local/etc/rc.d/apache22 restart

For the use case of this service/machine, this is enough. In case of a prob­lem with the ser­vice, a mail with the restart out­put would arrive each time it runs, else only after a reboot for which the ser­vice did not restart.

2 thoughts on “Cheap process mon­i­tor­ing (no addi­tion­al soft­ware required)”

  1. I used to do this as well, but have since moved to dae­mon­tools. Ensures process­es are *always* up with­out a 10 minute win­dow of down­time pos­si­ble. Espe­cial­ly if your app if crash­ing as well.

    1. Dae­mon­tools is not a part of the base sys­tem, so it would require to install it addi­tion­al­ly. The post I made is for cas­es where the uptime does not mat­ter that much, and where you do not want to spend the time to install and con­fig­ure addi­tion­al soft­ware. And the 10 min­utes was just an exam­ple, you could let cron check every minute if you want. But as I wrote, in my case it does not mat­ter much. Places where I would use dae­mon­tools are places where such a reboot for an unknown rea­son which I described would not be accept­able (num­ber one pri­or­i­ty would be to find out the rea­son of the reboot and do some­thing to fix the problem).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.