Deprecation Announcement
This feature is deprecated as it is replaced by the JobScheduler Monitoring Interface - Overview. The JobScheduler Monitoring Interface provides better integration with Nagios without modifications to your jobs and job chains, e.g. providing recovery messages, performance checks and individual routing of job error messages to specific Nagios services. For use of actice checks see How to perform active checks with a System Monitor such as Nagios/op5
FEATURE AVAILABILITY ENDING WITH RELEASE 1.10
Nagios is an Open Source network monitor that is available at http://www.nagios.org.
Installation
The nagios integration has two parts.
- The Log Analyser. This is a job which must run periodically in JobScheduler. This job examines the JobScheduler main log. If error messages or warnings are found, they will be stored in the JobScheduler database (Table SCHEDULER_MESSAGES).
- The nagios plugin. This is a perl script, which looks into the JobScheduler Database to find some error messages or warnings.
Installation of the nagios plugin
You need the perl > 5.8 and the perl packages NET::HTTP and DBI. You can install these packages from http://www.cpan.
- Unzip JobScheduler_nagios.tar.gz to any folder.
- gzip -d nagios.tar.gz
- tar -xvf nagios.tar
- Copy the files ./nagios/bin/plugin sos.check_scheduler.pl and ./nagios/bin/SOSScheduler.pm to the plugin directory of your nagios installation.
- Copy the config folder to the plugin directory of your nagios installation
- Create a file config/sos_settings.ini. You can use the example files in the config folder.
Configure your nagios with this plugin. For this, you have to add a service for each group of job chains or jobs you want to include in the monitoring. You also have to add the command for the plugin. You can use the file jobscheduler.cfg which have the example configuration. Please add the line cfg_filh1. /usr/local/nagios/etc/jobscheduler.cfg to your nagios.cfg configuration file to include this file.
define service\{ use generic-service host_name localhost service_description SchedulerLog is_volatile 0 check_period 24x7 max_check_attempts 1 normal_check_interval 1 retry_check_interval 1 contact_groups admins notification_options w,u,c,r notification_interval 960 notification_period 24x7 check_command sos_check_scheduler!prodscheduler!4444!0!blacklist,test/job3!! active_checks_enabled 1 passive_checks_enabled 1
# 'check_scheduler' command definition define command\{ command_name sos_check_scheduler command_line /home/nagios/sos_check_scheduler.pl -i $ARG1$ -H $HOSTADDRESS$ -p $ARG2$ -m $ARG3$ -j $ARG4$ -c $ARG5$ \}
Before restarting nagios, check your configuration with
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Installation of the Log Analyser Job
To install the Log Analyser, you have to copy the folder config/live/Nagios to your JobScheduler configuration directory. You can adjust the runtime of the job JobSchedulerLogAnalyser. The default runtime for analysing the logfile is every 5 minutes. The default for resetting all messages is every day at 11:00 pm and for deleting messages from database every Monday at 7:00 am.
<?xml version="1.0" encoding="ISO-8859-1"?> <job title="Analyse Job Logfile"> <script java_class="sos.scheduler.logMessage.JobSchedulerLogAnalyser" language="java"/> <run_time> <period absolute_repeat="00:05" begin="00:00" end="24:00"/> </run_time> </job>
[ Parameter Description|http://www.sos-berlin.com/doc/JITL/JobSchedulerLogAnalyser.xml]
Installation of the table SCHEDULER_MESSAGES
If you are running JobScheduler with Version > 1.3.10, the table SCHEDULER_MESSAGES is already installed. In other cases, you find the create table command in the directory ./nagios/db/yourdb. Please install this table using your database client.
Testing your installation
- Execute the plugin in a shell
Example: perl ./sos_check_scheduler.pl -ischeduler_139 -Hur.sos -p4139 -m0 -j test/job1
- Please make sure, that the job Nagios/ JobSchedulerLogAnalyser is running. You should see the job in JOC when opening host:port
- Open your nagios console. You should see the configured services.
- Check your nagios configuration with
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
How it works
You define the parameters for monitoring in the Nagios configuration file jobscheduler.cfg.
For example, if you have several JobScheduler instances running or you want to monitor different groups of jobs and job chains then you have to define one service for each JobScheduler or group of jobs and job chains. You can not mix group of jobs and group of job chains. You have to configure one service for job chains and one for jobs.
Parameter | long | Default | Description |
---|---|---|---|
-H, | --hostname | — | Name or IP address of the host JobScheduler is running |
-p | --port | — | Port that JobScheduler listens to |
-m | --max | — | optional: todo |
-i | --scheduler_id | — | optional: When the log analyser finds an entry in the log file then that message will be stored in a database. If log analyser finds the message once more then a counter for this message will be incremented. With this parameter you can specify that only messages with a counter less than the given value are monitored. |
-f | --file_configuration | — | optional: You can specify all parameters in a configuration file. The name of the file is specified with this parameter. |
-c | --job_chain | — | optional: Defines a filter for job chains which should be monitored. The names of the job chains are set in a list with comma. |
-j | --job | — | optional: Defines a filter for jobs which should be monitored. The names of the jobs are set in a list with comma. |
Example
Test job blacklist and test/job3
define service { use generic-service host_name localhost service_description SchedulerLog is_volatile 0 check_period 24x7 max_check_attempts 1 normal_check_interval 1 retry_check_interval 1 contact_groups admins notification_options w,u,c,r notification_interval 960 notification_period 24x7 check_command sos_check_scheduler!prodscheduler!4444!0!blacklist,test/job3!! active_checks_enabled 1 passive_checks_enabled 1 }
Test job chain test/print_chain
define service { use generic-service host_name localhost service_description SchedulerLog is_volatile 0 check_period 24x7 max_check_attempts 1 normal_check_interval 1 retry_check_interval 1 contact_groups admins notification_options w,u,c,r notification_interval 960 notification_period 24x7 check_command sos_check_scheduler!prodscheduler!4444!0!!print_chain! active_checks_enabled 1 passive_checks_enabled 1 }
# 'check_scheduler' command definition define command { command_name sos_check_scheduler command_line /home/nagios/sos_check_scheduler.pl -i $ARG1$ -H $HOSTADDRESS$ -p $ARG2$ -m $ARG3$ -j $ARG4$ -c $ARG5$ }
Implementation
- Nagios Plugin: Reading the database with error messages and warnings. You can start the plugin in your shell for example as follows:
perl sos_check_scheduler.pl -H localhost -p 4444 -j jobname
- Job JobSchedulerLogAnalyser: Analysing log files and writing them into the database.
- Job JobSchedulerLogAnalyserReset: Resetting all messages.
- Job JobSchedulerLogAnalyserDelete: Deleting all messages.