Terminology
* Commands
* Time periods
* Contacts and Contact Groups
* Host
* Services
* Host and service escalations
Soft and Hard States
* Defines how many retries before escalate soft to hard state
Configuration
/etc/nagios/nagios.cfg
Web Interfaces
* Tactical Overview
* Status Map
* Host information
Plugins
check_ping
# Command. -H: host, # -w: warn, wrta: warn return time average, wpl%: warn packet loss percentage # -c: cirtical, -crta: critial return time average, cpl%: critical packet loss percentage # -p packet, -t timeout # -4|-6: ipv4 or ipv6 check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>% [-p packets] [-t timeout] [-4|-6] # For example, ping localhost with 5 packets, # warn if 1 packet returns in 3 seconds, # Output critical if 0 packet returns in 5 seconds: $ check_ping -H localhost -w 3000.0,80% -c 5000.0,100% -p 5 PING OK - Packet loss = 0%, RTA = 0.06 ms|rta=0.058000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
check_tcp
check_tcp|check_udp -H host -p port [-w <warning >] [-c <critical >] [-s <send string>] [-e <expect string>] [-q <quit string>] [-A] [-m <maximum bytes>] [-d <delay>] [-t <timeout>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j] [-D <days to cert expiry>] [-S] [-E] # For example, check localhost on port 80 check_tcp -H localhost -p 80 TCP OK - 0.000 second response time on port 80|time=0.000220s;;;0.000000;10.000000
check_pop, check_spop, check_imap, check_simap
* Similar to check_tcp
check_smtp
* Similar to check_tcp
* Port defaults to 25
check_smtp -H host [-p port] [-C command] [-R response] [-f from addr] [-F hostname] [-A authtype –U authuser –P authpass] [-w <warning time>] [-c <critical time>] [-t timeout] [-S] [-D days] [-n] [-4|-6] check_smtp -H smtp.my.com -p 25
check_ftp
* Similar to check_tcp
* Port defaults to 21 ot 990 for ssl
* Expect standard FTP welcome message
check_ftp -H ftp.my.com
check_dhcp
check_nagios
check_http
check_http -H <vhost> | -I <IP-address> [-u <uri>] [-p <port>] [-w <warning time>] [-c <critical time>] [-t <timeout>] [-L] [-a auth] [-f <ok | warn | critcal | follow>] [-e <expect>] [-s string] [-l] [-r <regex> | -R <regex>] [-P string] [-m <min_pg_size>:<max_pg_size>] [-4|-6] [-N] [-M <age>] [-A string] [-k string] [-S] [-C <age>] [-T <content-type>] # Examples check_http -H www.yahoo.com -p 80 check_http -H coe-soa-1 -p 9900 -u /ms/index.do
check_mysql
check_mysql [-H host] [-d database] [-P port] [-u user] [-p password] [-S] check_mysql_query -q SQL_query [-w <warn>] [-c <crit>] [-d database] [-H host] [-P port] [-u user] [-p password]
check_pgsql
check_pgsql [-H <host>] [-P <port>] [-w <warn>] [-c <crit>] [-t <timeout>] [-d <database>] [-l <logname>] [-p <password>]
check_oracle
* Need Oracle client installation (tnsping)
check_oracle --tns <ORACLE_SID> --db <ORACLE_SID> --oranames <Hostname> --login <ORACLE_SID> --cache <ORACLE_SID> <USER> <PASS> <CRITICAL> <WARNING> --tablespace <ORACLE_SID> <USER> <PASS> <TABLESPACE> <CRITICAL> <WARNING>
check_swap
Check virtual memory.
check_swap [-a] [-v] -w limit -c limit # -a: all # -w limit: warn if swap fall below limit # -c limit: critical if swap fall below limit
check_ide_smart
check_ide_smart [-d <device>] [-i] [-q] [-1] [-O] [-n]
check_disk
Check disk space.
check_disk -w limit # warn if below limit -c limit # critical if below limit [-W limit] # warn if inode below limit [-K limit] # critical if inode below limit {-p path # -p path or partition, can be repeated | -x device} # -x exclude path [-C] # clear thresholds [-E] # only checks for exact path as specified by -p [-e] # displays errors only [-g group ] [-k] # kb [-l] # check local file system only [-M] # displays mount point instead of path [-m] # mb [-r path ] # regex for path/partition, can be repeated [-R path ] # as -r but case insensitive [-t timeout] # in seconds, default to 10 [-u unit] # bytes, kB, MB, GB, TB, default to MB [-v] # verbose [-X type] # exclude file type, can be repeated # Examples check_disk -w 500 -c 10 -p /tmp DISK OK - free space: /tmp 4449 MB (96% inode=99%);| /tmp=140MB;4340;4830;0;4840
check_disk_smb
Check disk space on remote shares.
check_disk_smb -H <host> -s <share> -u <user> -p <password> -w <warn> -c <crit> [-W <workgroup>] [-P <port>]
check_disk
Check system load.
check_load [-r] # divide load average by number of CPUs -w WLOAD1,WLOAD5,WLOAD15 # warn if load averages exceed 1, 5, 15 min averages -c CLOAD1,CLOAD5,CLOAD15 # critical if load averages exceed 1, 5,, 15 min averages # Example, warn if 1min load average exceeds 10, 5min 8, 15min 5 # critical if 1min load average exceeds 15, 5min 10, 15min 8 check_load -w 10.0,8.0,5.0 -c 15.0,10.0,8.0
check_procs
check_procs -w <range> # warn if outside range -c <range> # critical if outside range [-m metric] # metric type: PROCS, VSZ, RSS, CPU, ELAPSED [-s state] # only scan for processes with one or more status flags form ps command [-p ppid] # only scan for child processes of parent ppid [-u user] # only scan for user or user id [-r rss] # only scan for processes with rss higher than indicated [-z vsz] # only scan for processes with vsz higher than indicated [-P %cpu] # only scan for processes with pcpu higher than indicated [-a argument-array] # only scan for processes with args that contain string [-C command] # only scan for exact matches of command [-t timeout] # timeout [-v] # Example # Alert if CPU of any processes over 10% or 20% check_procs -w 10 -c 20 --metric=CPU
Monitor Logged In User
check_users -w limit -c limit # Example, warn if one user logged in, critical if 1 user logged in check_users -w 0 -c 1 #USERS WARNING - 1 users currently logged in |users=1;0;1;0
References
http://www.debianhelp.co.uk/nagiosinstall.htm
Learning Nagios 3.0: A Detailed Tutorial to Setting Up, Configuring, and Managing This Easy and Effective System Monitoring Software by Wojciech Kocjan
http://nagios.sourceforge.net/docs/3_0/toc.html