Cheap pro­cess mon­it­or­ing (no ad­di­tion­al soft­ware re­quired)

I have an old sys­tem (only the hard­ware, it runs -cur­rent) which re­boots it­self from time to time (mostly dur­ing the daily periodic(8) run, but also dur­ing a lot of com­pil­ing (por­tup­grade)). There is no ob­vi­ous reas­on (no pan­ic) why it is do­ing this. It could be that there is some hard­ware de­fect, or some­thing else. It is not im­port­ant enough to get a high enough pri­or­ity that I try hard to ana­lyze the prob­lem with this ma­chine. The an­noy­ing part is, that some­times after a re­start apache does not start. So if this hap­pens, the solu­tion is to lo­gin and start the web­serv­er. If the web­serv­er would start each time, nearly nobody would de­tect the re­boot (root gets an EMail on each re­boot via an @reboot crontab entry).

My prag­mat­ic solu­tion (for ser­vices star­ted via a good rc.d script which has a work­ing status com­mand) is a crontab entry which checks peri­od­ic­ally if it is run­ning and which re­starts the ser­vice if not. As an ex­ample for apache and an in­ter­val of 10 minutes:

*/​10 * * * *    /usr/local/etc/rc.d/apache22 status >/​dev/​null 2>&1 || /usr/local/etc/rc.d/apache22 re­start

For the use case of this service/​machine, this is enough. In case of a prob­lem with the ser­vice, a mail with the re­start out­put would ar­rive each time it runs, else only after a re­boot for which the ser­vice did not re­start.

2 thoughts on “Cheap pro­cess mon­it­or­ing (no ad­di­tion­al soft­ware re­quired)”

  1. I used to do this as well, but have since moved to dae­mon­tools. En­sures pro­cesses are *al­ways* up without a 10 minute win­dow of down­time pos­sible. Es­pe­cially if your app if crash­ing as well.

    1. Dae­mon­tools is not a part of the base sys­tem, so it would re­quire to in­stall it ad­di­tion­ally. The post I made is for cases where the up­time does not mat­ter that much, and where you do not want to spend the time to in­stall and con­fig­ure ad­di­tion­al soft­ware. And the 10 minutes was just an ex­ample, you could let cron check every minute if you want. But as I wrote, in my case it does not mat­ter much. Places where I would use dae­mon­tools are places where such a re­boot for an un­known reas­on which I de­scribed would not be ac­cept­able (num­ber one pri­or­ity would be to find out the reas­on of the re­boot and do some­thing to fix the prob­lem).

Leave a Reply

Your email address will not be published. Required fields are marked *