Deprecation Announcement
This feature is deprecated as it is replaced by the JobScheduler Monitoring Interface - Overview. The JobScheduler Monitoring Interface provides better integration with Nagios without modifications to your jobs and job chains, e.g. providing recovery messages, performance checks and individual routing of job error messages to specific Nagios services. For use of active checks see How to perform active checks with a System Monitor such as Nagios/op5
FEATURE AVAILABILITY ENDING WITH RELEASE 1.10
Job Scheduler and Monitoring with Nagios
Introduction
JobScheduler executes jobs and can, for example, inform the responsible persons per e-mail in the event of an error. In an environment where Nagios is used to monitor the processes, it is recommended that the method of notification provided by Nagios itself is used.
This document describes how JobScheduler messages can be written directly into the Nagios console.
For example:
- In case of error
- In case of success
- Monitoring of single jobs and job chains
- Monitoring whether a particular job has run (successfully) at a specific time
System Environment
Nagios and JobScheduler can run on different servers.
The communication takes place via the Nagios NSCA add-in.
When monitoring Windows systems, a Shell script is called via SSH and writes directly to the Nagios command pipe file.
Preconditions
- NSCA demon has to be installed on the Nagios server
- NSCA client has to be installed on the Job Scheduler Server
- A shell script for the description of the command pipe has to be installed on the Nagios Server (Windows.)
Job Scheduler Configuration
The following job chains have to be installed
- nsca_communication - for communication with NSCA
- send_nsca_nagios - calls send_nsca
- set_order_state - marks the end status for the NSCA order
CheckJobRun - check whether a particular job has run.
- error/ job_check_job_run - checks if a particular job has run successfully.
- error/ add_nagios_alert - generates an order for nsca_communication
- sample_job_chain_with_errorHandling_at_the_end_by_nsca An example job chain
- sample_order:_jobr sets an exit code
- error/prepare_error.job sets the error message for the order to NSCA
- error/ add_nagios_alert to create an order for nsca_communication
- error/set_error sets the orders error state
Installation:
The JobScheduler_nagios.tar.gz file has to be unzipped into the scheduler/config/live folder.
The following files are created and copied into the scheduler/config/live/nagios directory:
- nsca_communicationNSCA.job_chain.xml
- set_order_state.job.xml
- sendEvent2NagiosNSCA.job.xml
Job Chain for Communication with Nagios (Windows) SSH:
.The following files are copied into the scheduler/config/live/nagios directory:
- nagios_communicationSSH.job_chain.xml
- set_order_state.job.xml
- sendEvent2NagiosSSH.job.xml
Testing if a Job has run successfully
The following files are copied into the scheduler/config/live/nagios/error directory:
- CheckJobRun.job_chain.xml
- job_check_job_run.job.xml
- add_nagios_alert.job.xml
Error Treatment for a Job
The handle_exit_code.js file is copied into the directory scheduler/config/live/nagios/error
Error description for ..... in a Job Chain
The following files are copied into the scheduler/config/live/nagios/error directory:
- prepare_error.job.xml
- set_error.job.xml
Functionality:
Job Chain for Communication with Nagios (Linux) NSCA:
This job chain triggers the call of the NSCA client.
The sendEvent2NagiosNSCA job makes a parameterised call to the NSCA client.
The 'set_order_state_job sets the status of the order according to the par_severity parameter._
The order history will then show the type of job that has been executed in the operations GUI.
Job Chain for Communication with Nagios (Windows) SSH
This job chain triggers the Shell script call, which writes directly to the Nagios command pipe.
The sendEvent2NagiosSSH job is parameterised for the SSH connection.
The set_order_state job sets the status of the order according to the par_severity parameter.
The order history will then show the type of job that has been executed in the operations GUI.
Testing for a successful job run
This uses the CheckRun job chain. An order is created for each job run that has to be checked.
An order has been implemented as an example. This example checks if the Job Run SampeJobNscaNotification job has run successfully.
Error Treatment for a Job
Post processing can be implemented to check individual jobs. In this case a message is send to Nagios.
Error Treatments with Nodes in a Job Chain
- The sample_job_chain_with_errorHandling_at_the_end_by_nsca job chain shows error handling if the messages are send to Nagios.
- The sample_order_job2 job simulates an error
prepare error: sets the order parameter "par_message" to the value <jobkette>/<order_id> RC = <exit_code>
add_nagios_alert: adds an order to a job chain for communication with Nagios.
Par_severity is set according to the Exit Code. This part can be individually set for each customer.
If the par_service parameter is not set, then the on Scheduler Errors, Scheduler Warnings or Scheduler Success service is set according to the value of par_severity.
Set_error
- Sets the errors status to error. This step ensures that erroneous orders are marked correctly
- Sets the state to error. This script is necessary to mark orders with errors although the error handling itself was successful.
Nagios Configuration
Setting up a Host for each JobScheduler:
Sets up a host for each server running a JobScheduler. If several JobSchedulers are installed on a host, the same host configuration is used for all JobSchedulers.
define host { use generic-host ; Name of host template to use host_name yourHostName alias aliasHost address 192.11.0.100 check_command check-host-alive max_check_attempts 10 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups admins }
Services have to be set up within the Nagios configuration. A generic, re-usable service is set up. Here it is important to activate the passive checks and deactivate the active ones. The obligatory service definition is made with a dummy.
define service { use generic-service name passive_service active_checks_enabled 0 passive_checks_enabled 1 # We want only passive checking flap_detection_enabled 0 register 0 # This is a template, not a real service is_volatile 0 check_period 24x7 max_check_attempts 1 normal_check_interval 5 retry_check_interval 1 check_freshness 0 contact_groups admins check_command check_dummy!0 notification_interval 120 notification_period 24x7 notification_options w,u,c,r stalking_options w,c,u }
define command { command_name check_dummy command_line $USER1$/check_dummy $ARG1$ }
Whilst it is sufficient to set up a single service to collects all the messages from JobScheduler,
we recommended that messages are distributed to several services so that, for example,
success messages can be overwritten in the event of error.
One possible criterion for dividing messages is the message type (error, warning, success).
It is also possible to set up individual services for particular jobs or job chains.
Messages for a particular Job or Job Chain
define service { use passive_service host_name yourHostName service_description Job Run SampeJobNscaNotification }
Warnings, Errors and Success Messages
define service { use passive_service host_name yourHostName service_description Scheduler Warnings }
define service { use passive_service host_name yourHostName service_description Scheduler Errors }
define service { use passive_service host_name yourHostName service_description Scheduler Messages
Script to write into the command pipe (only if Windows-Servers are monitored)
Note that a a Shell script has to be installed on the Nagios server if messages from Job Scheduler (on Windows) are to be written to the Nagios console. This script will write to the Nagios command pipe and is executed via SSH.
Read more here:
Sources:
- NSCA Installation: http://nagios.sourceforge.net/download/contrib/documentation/misc/NSCA_Setup.pdf
- NSCA Download: http://www.nagios.org/download/addons
- Shell Script sendEvent2Nagios http://nagios.sourceforge.net/download/contrib/misc/sendevent2nagios/
- File with the examples: JobScheduler_nagios.tar.gz