Monit the best program to use to restart/monitor processes?

I notice every once and a while my system will hang. I'm pretty sure it's because of mysql or apache every once in a while.

I think Monit it used to restart processes like this when it gets too resource heavy bogging down the server. Am I correct on this?

Anyone have example mysql/apache configs that are pretty basic and usable? Perhaps an alternative to monit?

Thanks.

2 Replies

Monit monitors systems and restarts services when they fail.

You should find the cause of your problems instead of using software to restart services (not saying you shouldn't use monit too).

A few things that will help

1) install munin

2) show your apache config file

3) show your mysql config file

4) show the output of top

IMO, monit's greatest advantages are that it is relatively simple and light. I'm not sure I would choose it if I needed super-duper uptime or detailed stats, but for my purpose it is good enough.

Here are selections from my monitrc. Config-specific variables have been redacted and are marked with %%%. Also, note that Postfix is installed in a send-only configuration, so I only care if it is up and accessible from localhost. The built-in HTTP server is set to only bind to localhost - on the rare occasions I need to use it, I do so via an SSH tunnel with the command ssh -L 2812:localhost:2812 mylogin@mylinodeipaddress.

###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
#
set daemon 120
#
## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omited, monit will use 'user' facility by default. You can specify the
## path to the file for monit native logging.
#
# set logfile syslog facility log_daemon
set logfile /var/log/monit.log
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
# set alert sysadm@foo.bar                       # receive all alerts
# set alert manager@foo.bar only on { timeout }  # receive just service-
#                                                # timeout alert
set alert %%%YOUR ADMIN E-MAIL ADDRESS%%%

## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812 and
    use address localhost  # only accept connection from localhost
    allow localhost        # allow localhost to connect to the server and
    allow %%%LOGIN%%%:%%%PASS%%%  # require user LOGIN with password PASS

###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
check system localhost
    if loadavg (1min) > 10 then alert
    if loadavg (5min) > 8 then alert
    if memory usage > 80% then alert
    if cpu usage (user) > 70% for 2 cycles then alert
    if cpu usage (system) > 50% for 2 cycles then alert
    if cpu usage (wait) > 50% for 2 cycles then alert
    if loadavg (1min) > 20 for 3 cycles then exec "/sbin/shutdown -r now"
    if loadavg (5min) > 15 for 5 cycles then exec "/sbin/shutdown -r now"
    if memory usage > 97% for 3 cycles then exec "/sbin/shutdown -r now"

## Check that a process is running, responding on the HTTP request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (mysql) which
## is defined in the monit control file as well.

check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start"
    stop program  = "/etc/init.d/apache2 stop"
    if cpu > 80% for 5 cycles then restart
    if children > 50 then alert
    if children > 60 then restart
# Apache MaxClients = 60
    if failed host %%%PUBLIC IP ADDR%%% port 80 protocol http
       and request "/index.html"
# Some smallish page that should be available when server is up
       with timeout 10 seconds
       for 2 cycles
# Sometimes Apache doesn't respond right away, so give it two chances before
# forcing a restart.
       then restart
    depends on mysql
    if 3 restarts within 8 cycles then timeout

check process mysql with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program  = "/etc/init.d/mysql stop"
    if cpu > 80% for 5 cycles then restart
    if totalmem > 200.0 MB for 5 cycles then restart
# Base above value on your experience
    if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql
# If you use the network instead of a UNIX socket, adjust settings
       with timeout 15 seconds
       then restart
    if 3 restarts within 5 cycles then timeout

check process sshd with pidfile /var/run/sshd.pid
    start program = "/etc/init.d/ssh start"
    stop program  = "/etc/init.d/ssh stop"
    if cpu > 80% for 5 cycles then restart
    if totalmem > 200.0 MB for 5 cycles then restart
    if failed host %%%PUBLIC IP ADDR%%% port 22 protocol ssh 2 times within 2 cycles
       then restart
    if 3 restarts within 8 cycles then timeout

check process postfix with pidfile /var/spool/postfix/pid/master.pid
    start program = "/etc/init.d/postfix start"
    stop program  = "/etc/init.d/postfix stop"
    if cpu > 30% for 5 cycles then restart
    if totalmem > 60.0 MB for 3 cycles then restart
    if failed host localhost port 25 protocol smtp
       with timeout 60 seconds
       then restart
    if 3 restarts within 8 cycles then timeout

## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.

check device filesystem with path /dev/xvda
    if space usage > 80% for 5 times within 15 cycles then alert
    if space usage > 95% then exec "/etc/init.d/apache2 stop ; /etc/init.d/mysql stop"
    if inode usage > 70% then alert
    if inode usage > 95% then exec "/etc/init.d/apache2 stop ; /etc/init.d/mysql stop"

## Check a file's timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
# Monitor denyhosts activity, but not as often
check file hosts.deny path /etc/hosts.deny
    every 3 cycles
    if changed checksum then alert

There are probably many optimizations I could make to the above, but it works well enough to avoid downtime of more than a few minutes. Configuration is easy enough to figure out, which was a major plus in my book. As obs points out, it's not a substitute for proper configuration, but is a useful fallback when things go unexpectedly wrong.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct