12.4.2. Setting Up Nagios
Unlike Munin, Nagios does not necessarily require installing anything on the monitored hosts; most of the time, Nagios is used to check the availability of network services. For instance, Nagios can connect to a web server and check that a given web page can be obtained within a given time.
The first step in setting up Nagios is to install the nagios3, nagios-plugins and nagios3-doc packages. Installing the packages configures the web interface and creates a first nagiosadmin
user (for which it asks for a password). Adding other users is a simple matter of inserting them in the /etc/nagios3/htpasswd.users
file with Apache's htpasswd
command. If no Debconf question was displayed during installation, dpkg-reconfigure nagios3-cgi
can be used to define the nagiosadmin
password.
Pointing a browser at http://server
/nagios3/
displays the web interface; in particular, note that Nagios already monitors some parameters of the machine where it runs. However, some interactive features such as adding comments to a host do not work. These features are disabled in the default configuration for Nagios, which is very restrictive for security reasons.
As documented in /usr/share/doc/nagios3/README.Debian
, enabling some features involves editing /etc/nagios3/nagios.cfg
and setting its check_external_commands
parameter to “1”. We also need to set up write permissions for the directory used by Nagios, with commands such as the following:
#
/etc/init.d/nagios3 stop
[...]
#
dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw
#
dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3
#
/etc/init.d/nagios3 start
[...]
The Nagios web interface is rather nice, but it does not allow configuration, nor can it be used to add monitored hosts and services. The whole configuration is managed via files referenced in the central configuration file, /etc/nagios3/nagios.cfg
.
These files should not be dived into without some understanding of the Nagios concepts. The configuration lists objects of the following types:
a host is a machine to be monitored;
a hostgroup is a set of hosts that should be grouped together for display, or to factor some common configuration elements;
a service is a testable element related to a host or a host group. It will most often be a check for a network service, but it can also involve checking that some parameters are within an acceptable range (for instance, free disk space or processor load);
a servicegroup is a set of services that should be grouped together for display;
a contact is a person who can receive alerts;
a contactgroup is a set of such contacts;
a timeperiod is a range of time during which some services have to be checked;
a command is the command line invoked to check a given service.
According to its type, each object has a number of properties that can be customized. A full list would be too long to include, but the most important properties are the relations between the objects.
A service uses a command to check the state of a feature on a host (or a hostgroup) within a timeperiod. In case of a problem, Nagios sends an alert to all members of the contactgroup linked to the service. Each member is sent the alert according to the channel described in the matching contact object.
An inheritance system allows easy sharing of a set of properties across many objects without duplicating information. Moreover, the initial configuration includes a number of standard objects; in many cases, defining now hosts, services and contacts is a simple matter of deriving from the provided generic objects. The files in /etc/nagios3/conf.d/
are a good source of information on how they work.
The Falcot Corp administrators use the following configuration:
Example 12.3. /etc/nagios3/conf.d/falcot.cfg
file
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
register 0 ; Template only
}
define contact{
use generic-contact
contact_name rhertzog
alias Raphael Hertzog
email hertzog@debian.org
}
define contact{
use generic-contact
contact_name rmas
alias Roland Mas
email lolando@debian.org
}
define contactgroup{
contactgroup_name falcot-admins
alias Falcot Administrators
members rhertzog,rmas
}
define host{
use generic-host ; Name of host template to use
host_name www-host
alias www.falcot.com
address 192.168.0.5
contact_groups falcot-admins
hostgroups debian-servers,ssh-servers
}
define host{
use generic-host ; Name of host template to use
host_name ftp-host
alias ftp.falcot.com
address 192.168.0.6
contact_groups falcot-admins
hostgroups debian-servers,ssh-servers
}
# 'check_ftp' command with custom parameters
define command{
command_name check_ftp2
command_line /usr/lib/nagios/plugins/check_ftp -H $HOSTADDRESS$ -w 20 -c 30 -t 35
}
# Generic Falcot service
define service{
name falcot-service
use generic-service
contact_groups falcot-admins
register 0
}
# Services to check on www-host
define service{
use falcot-service
host_name www-host
service_description HTTP
check_command check_http
}
define service{
use falcot-service
host_name www-host
service_description HTTPS
check_command check_https
}
define service{
use falcot-service
host_name www-host
service_description SMTP
check_command check_smtp
}
# Services to check on ftp-host
define service{
use falcot-service
host_name ftp-host
service_description FTP
check_command check_ftp2
}
This configuration file describes two monitored hosts. The first one is the web server, and the checks are made on the HTTP (80) and secure-HTTP (443) ports. Nagios also checks that an SMTP server runs on port 25. The second host is the FTP server, and the check include making sure that a reply comes within 20 seconds. Beyond this delay, a warning is emitted; beyond 30 seconds, the alert is deemed critical. The Nagios web interface also shows that the SSH service is monitored: this comes from the hosts belonging to the ssh-servers
hostgroup. The matching standard service is defined in /etc/nagios3/conf.d/services_nagios2.cfg
.
Note the use of inheritance: an object is made to inherit from another object with the “use parent-name
”. The parent object must be identifiable, which requires giving it a “name identifier
” property. If the parent object is not meant to be a real object, but only to serve as a parent, giving it a “register 0” property tells Nagios not to consider it, and therefore to ignore the lack of some parameters that would otherwise be required.