Writing a new plugin from scratch
Even given the very useful standard plugins in the Nagios Plugins set, and the large number of custom plugins available on Nagios Exchange, occasionally as our monitoring setup grows more refined, we may find that there is some service or property of a host that we would like to check, but for which there doesn't seem to be any suitable plugin available. Every network is different, and sometimes the plugins that others have generously donated their time to make for the community don't quite cover all your bases. Generally, the more specific your monitoring requirements get, the less likely it is that there's a plugin available that does exactly what you need.
In this example, we'll deal with a very particular problem that we'll assume can't be dealt with effectively by any known Nagios Core plugins, and we'll write one ourselves using Perl. Here's the example problem:
Our Linux security team wants to be able to automatically check whether any of our servers are running kernels that have known exploits. However, they're not worried about every vulnerable kernel, only specific versions. They have provided us with the version numbers of three kernels having small vulnerabilities that they're not particularly worried about but that do need patching, and one they're extremely worried about.
Let's say the minor vulnerabilities are in the kernels with version numbers 2.6.19
, 2.6.24
, and 3.0.1
. The serious vulnerability is in the kernel with version number 2.6.39
. Note that the version numbers in this case are arbitrary and don't necessarily reflect any real kernel vulnerabilities!
The team could log in to all of the servers individually to check them, but the servers are of varying ages and access methods, and managed by different people. They would also have to check manually more than once, because it's possible that a naive administrator could upgrade to a kernel that's known to be vulnerable in an older release, and they also might want to add other vulnerable kernel numbers for checking later on.
So, the team have asked us to solve the problem with Nagios Core monitoring, and we've decided the best way to do it is to write our own plugin, check_vuln_kernel
, which checks the output of uname
for a kernel version string, and then does the following:
- If it's one of the slightly vulnerable kernels, then it will return a
WARNING
state, so that we can let the security team know that they should address it when they're next able to. - If it's the highly vulnerable kernel version, then it will return a
CRITICAL
state, so that the security team knows a patched kernel needs to be installed immediately. - If
uname
gives an error or output we don't understand, then it will return anUNKNOWN
state, alerting the team to a bug in the plugin or possibly more serious problems with the server. - Otherwise, it returns an
OK
state, confirming that the kernel is not known to be a vulnerable one. - Finally, they want to be able to see at a glance in the Nagios Core monitoring what the kernel version is, and whether it's vulnerable or not.
For the purposes of this example, we'll only monitor the Nagios Core server itself, but via NRPE we'd be able to install this plugin on the other servers that require this monitoring, where they'll work just as well. You should see the Monitoring local services on a remote machine with NRPE recipe in Chapter 6, Enabling Remote Execution to learn how to do this.
While this problem is very specific, we'll approach it in a very general way, which you'll be able to adapt to any solution where it's required for a Nagios plugin to:
- Run a command and pull its output into a variable.
- Check the output for the presence or absence of certain patterns.
- Return an appropriate status based on those tests.
All that means is that if you're able to do this, you'll be able to effectively monitor anything on a server from Nagios Core!
Getting ready
You should have a Nagios Core 3.0 or newer server running with a few hosts and services configured already. You should also already be familiar with the relationship between services, commands, and plugins. You should also have Perl installed.
This will be a rather long recipe that ties in a lot of Nagios Core concepts. You should be familiar with all the following concepts:
- Defining new hosts and services, and how they relate to one another
- Defining new commands, and how they relate to the plugins they call
- Installing, testing, and using Nagios Core plugins
Some familiarity with Perl would also be helpful, but is not required. We'll include comments to explain what each block of code is doing in the plugin.
How to do it...
We can write, test, and implement our example plugin as follows:
- Change to the directory containing the plugin binaries for Nagios Core. The default location is
/usr/local/nagios/libexec
:# cd /usr/local/nagios/libexec
- Start editing a new file called
check_vuln_kernel
:# vi check_vuln_kernel
- Include the following code in it; take note of the comments, which explain what each block of code is doing:
#!/usr/bin/env perl # # Use strict Perl style and report potential problems to help us write this # securely and portably. # use strict; use warnings; # # Include the Nagios utils.pm file, which includes definitions for the return # statuses that are appropriate for each level: OK, WARNING, CRITICAL, and # UNKNOWN. These will become available in the %ERRORS hash. # use lib "/usr/local/nagios/libexec"; use utils "%ERRORS"; # # Define a pattern that matches any kernel vulnerable enough so that if we find # it we should return a CRITICAL status. # my $critical_pattern = "^(2\.6\.39)[^\\d]"; # # Same again, but for kernels that only need a WARNING status. # my $warning_pattern = "^(2\.6\.19|2\.6\.24|3\.0\.1)[^\\d]"; # # Run the command uname with option -r to get the kernel release version, put # the output into a scalar $release, and trim any newlines or whitespace # around it. # chomp(my $release = qx|/bin/uname -r|); # # If uname -r exited with an error status, that is, anything greater than 1, # then there was a problem and we need to report that as the UNKNOWN status # defined by Nagios Core's utils.pm. # if ($? != 0) { exit $ERRORS{UNKNOWN}; } # # Check to see if any of the CRITICAL patterns are matched by the release # number. If so, print the version number and exit, returning the appropriate # status. # if ($release =~ m/$critical_pattern/) { printf "CRITICAL: %s\n", $release; exit $ERRORS{CRITICAL}; } # # Same again, but for WARNING patterns. # if ($release =~ m/$warning_pattern/) { printf "WARNING: %s\n", $release; exit $ERRORS{WARNING}; } # # If we got this far, then uname -r worked and didn't match any of the # vulnerable patterns, so we'll print the kernel release and return an OK # status. # printf "OK: %s\n", $release; exit $ERRORS{OK};
- Make the plugin owned by the
nagios
user and executable withchmod
:# chown nagios.nagios check_vuln_kernel# chmod 0770 check_vuln_kernel Run the plugin directly to test it: # sudo -s -u nagios $ ./check_vuln_kernel OK: 2.6.32-5-686
We should now be able to use the plugin in a command, and hence in a service check, just like any other command. Note that the code for this plugin is included in the code bundle of this book for your convenience.
How it works...
The code we added in the new plugin file check_vuln_kernel
is actually quite simple:
- It runs
uname -r
to get the version number of the kernel. - If that didn't work, it exits with a status of
UNKNOWN
. - If the version number matches anything in a pattern containing critical version numbers, it exits with a status of
CRITICAL
. - If the version number matches anything in a pattern containing warning version numbers, it exits with a status of
WARNING
. - Otherwise, it exits with a status of
OK
.
It also prints the status as a string, along with the kernel version number, if it was able to retrieve one.
We might set up a command definition for this plugin as follows:
define command { command_name check_vuln_kernel command_line $USER1$/check_vuln_kernel }
In turn, we might set up a service definition for that command as follows:
define service {
use local-service
host_name localhost
service_description VULN_KERNEL
check_command check_vuln_kernel
}
If the kernel was not vulnerable, the service's appearance in the web interface might look similar to the following screenshot:
However, if the monitoring server itself happened to be running a vulnerable kernel, then it might look more similar to the following screenshot (and send consequent notifications, if configured to do so):
There's more...
This may be a simple plugin, but its structure can be generalized to all sorts of monitoring tasks. If we can figure out the correct logic to return the status we want in an appropriate programming language, then we can write a plugin to do basically anything.
A plugin like this could just as effectively be written in C for improved performance, but we'll assume for simplicity's sake that high performance for the plugin is not required. Instead, we can use a language that's better suited for quick ad hoc scripts like this one; in this case, we use Perl. The file utils.sh
, also in /usr/local/nagios/libexec
, allows us to write in shell script if we'd prefer that.
If you write a plugin that you think could be generally useful for the Nagios community at large, then please consider putting it under a free software license and submitting it to the Nagios Exchange, so that others can benefit from your work. Community contribution and support is what has made Nagios Core such a great monitoring platform in such wide use.
Any plugin you publish in this way should conform to the Nagios Plugin Development Guidelines. At the time of writing, these are available at http://nagiosplug.sourceforge.net/developer-guidelines.html.
Finally, you should note that the method of including utils.pm
, used in this example, may be deprecated in future versions of Nagios Core. It is used here for simplicity's sake. The new method of including it in Perl is done with a CPAN module called Nagios::Plugin
.