Configuration
JobScheduler - SystemMonitorNotification files
Location: <scheduler_install>/config/notification
File | Description |
---|---|
SystemMonitorNotification_v1.0.xsd | The XML Schema file defines which values are allowed in your XML files for the JobScheduler monitoring. That means that to configure the JobScheduler objects you want to monitor and the System Monitor you just have to modify your |
SystemMonitorNotification_<MonitorSystem>.xml | Configuration file for each System Monitor.
|
| Configuration file for all System Monitors.
This file is optional and contains the definitions of the |
SystemMonitorNotification Elements
The configuration element descriptions are organized into the following major categories:
Element | Element description | Description |
---|---|---|
SystemMonitorNotification | Top Level Element | Configuration for notifications to be sent to a system monitor. |
Notification | Once or more inside a SystemMonitorNotification element | Specifies a system monitor notification that includes a command line invocation and the JobScheduler objects. |
Timer | Optional, once or more inside a SystemMonitorNotification element | Performance measurement definition. |
SystemMonitorNotification
SystemMonitorNotification
supports the following attributes:
Note:
- attribute
system_id
in case of the
SystemMonitorNotificationTimers.xml
the value of this attribute is not important and can have any value.
e.g.:
timers
Attribute | Usage | Description |
---|---|---|
system_id | required | System Monitor identifier. |
SystemMonitorNotification / Notification
The following elements may be nested inside a Notification
element:
Element | Element description | Description |
---|---|---|
NotificationMonitor | Once inside a Notification element | Specifies the System Monitor interface that is being used for messages: either by a Plug-in Interface or by command line invocation |
NotificationObjects | Once inside a Notification element | Specifies the Job Chain and the Timer definitions |
SystemMonitorNotification / Notification / NotificationMonitor
NotificationMonitor
supports the following attributes:
Note:
- attributes
service_name_on_error
andservice_name_on_success
- at least one of these attributes must be configured
- both attributes can be configured together
Attribute | Usage | Description |
---|---|---|
service_name_on_error | Optional | This setting specifies the service that is configured in the Service Monitor for messages of job runs with errors and for job recovery messages. The service name must match the corresponding setting in the System Monitor. |
service_name_on_success | Optional | This setting specifies the service that is configured in the Service Monitor for receiving informational messages on successful job runs. The service name must match the corresponding setting in the System Monitor |
service_status_on_error | Optional | This setting specifies the service status code for error messages. Default: |
service_status_on_success | Optional | This setting specifies the service status code for success messages Default: |
One of the following elements must be nested inside a NotificationMonitor
element:
Element | Element description | Description |
---|---|---|
NotificationInterface | Optional, once inside of NotificationMonitor element | Plug-in Interface to be executed for System Monitor notification |
NotificationCommand | Optional, once inside of NotificationMonitor element | Command line to be executed for System Monitor notification |
SystemMonitorNotification / Notification / NotificationMonitor / NotificationInterface
NotificationInterface
support the following attributes:
Attribute | Usage | Description |
---|---|---|
monitor_host | Required | This setting specifies the host name or ip address of System Monitor host. |
monitor_port | Required | This setting specifies the TCP port that the System Monitor would listen to. |
monitor_password | Optional | This setting specifies the password configured in the ncsa.cfg file used by NSCA. |
monitor_connection_timeout | Optional | This setting specifies the connection timeout in ms. Default: |
monitor_response_timeout | Optional | This setting specifies the NSCA response timeout in ms. |
monitor_encryption | Optional | This setting specifies that the communication with the System Monitor is encrypted. By default no encryption is used.
|
service_host | Required | This setting specifies the name of the host that executes the passive check. The name must match the corresponding setting in the System Monitor. |
plugin | Optional | Default: com.sos.scheduler.notification.plugins.notifier.SystemNotifierSendNscaPlugin |
SystemMonitorNotification / Notification / NotificationMonitor / NotificationCommand
NotificationCommand
support the following attributes:
Attribute | Usage | Description |
---|---|---|
plugin | Optional | Default: com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin |
SystemMonitorNotification / Notification / NotificationObjects
One of the following elements must be nested inside a NotificationObjects
element:
Element | Element description | Description |
---|---|---|
JobChain | Optional, once or more inside of NotificationObjects element | Restricts notifications for job chains |
Timer | Optional, once or more inside of NotificationObjects element | Restricts notifications for performance checks (Timer) |
SystemMonitorNotification / Notification / NotificationObjects / JobChain
JobChain
supports the following attributes:
Attribute | Usage | Description |
---|---|---|
notifications | Optional Integer | Specifies the number of notifications that are sent to a System Monitor. Default: |
scheduler_id | Optional | Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database. Regular expression can be used. |
name | Optional | Job chain name including possible folder names. Regular expression can be used. |
step_from | Optional | Restricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
step_to | Optional | Restricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
excluded_steps | Optional | Specifies the steps which will be excluded from the analyzing (separated by semicolon) |
SystemMonitorNotification / Notification / NotificationObjects / Timer
Timer supports the following attributes:
Attribute | Usage | Description |
---|---|---|
notifications | Optional Integer | Specifies the number of notifications that are sent to a System Monitor. Default: |
name | Optional | Corresponds with Timer name setting defined in the SystemMonitorNotification / Timer element |
notify_on_error | Optional Boolean | Send timer check notification when the configured job chain contains the error notifications. Default: |
SystemMonitorNotification / Timer
The following elements must be nested inside a Timer
element:
Element | Element description | Description |
---|---|---|
JobChain | Once or more inside of Timer element | Restricts notifications for job chains |
Minimum | Optional or once inside of Timer element | Minimum required time required for job or job chain execution. Allows script code to be executed that returns the minimum execution time required in seconds. |
Maximum | Optional or once inside of Timer element | Maximum allowed time required for job or job chain execution. Allows script code to be executed that returns the maximum execution time required in seconds. |
Timer
support the following attributes:
Attribute | Usage | Description |
---|---|---|
name | Required | Corresponds to Timer used in the The name must be unique across all timers definitions. |
SystemMonitorNotification / Timer / JobChain
JobChain
support the following attributes:
Attribute | Usage | Description |
---|---|---|
scheduler_id | Optional | Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database. Regular expression can be used. |
name | Optional | Job chain name including possible folder names. Regular expression can be used. |
step_from | Optional | Restricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
step_to | Optional | Restricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
SystemMonitorNotification / Timer / Minimum
The following elements must be nested inside a Minimum
element:
Element | Element description | Description |
---|---|---|
Script | Once inside of Minimum element | Script code in one of the supported languages |
SystemMonitorNotification / Timer / Maximum
The following elements must be nested inside a Maximum
element:
Element | Element description | Description |
---|---|---|
Script | Once inside of Maximum element | Script code in one of the supported languages |
SystemMonitorNotification / Timer / Minimum|Maximum / Script
Script
supports the following attributes:
Attribute | Usage | Description |
---|---|---|
language | Required | Script language name Supported languages:
|
The Script element can contain:
- a fixed value
- a calculation based on the job/order parameters
Fixed value
A fixed value is the time allowed in seconds for the specific Minimum
or Maximum
definition
Calculation
The calculation is to result in the time in seconds for the specific Minimum
or Maximum
definition.
This example calculates the execution time depending on the %file_size%
parameter that was set by a specific job (see the example below)´.
This example job calculates and creates a new order parameter file_size
.
To store the parameters into database (table SCHEDULER_MON_RESULTS
) :
- set the
scheduler_notification_result_parameters
parameter (see job documentationjobs/JobSchedulerNotificationStoreResultsJob.xml
) - set the
com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass
as monitor
Message
Usage
The Message can be configured on the following parent nodes as a CDATA element :
SystemMonitorNotification / Notification / NotificationCommand
SystemMonitorNotification / Notification / NotificationInterface
The Message can contain:
- fixed values
- variables
Example: <![CDATA[ scheduler id = %MON_N_SCHEDULER_ID% ]]>
Variables
All variables must be defined by using of the %<variable name>%
syntax.
The order of the substitution the variables values is:
- Table variables.
- Service variables.
- OS environment variables.
Table variables
Service variables
OS environment variables
All existing system variables can be defined by message using the syntax %<variable name>%
(Windows/Unix)
.
Examples
Notification environment variables
The default com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin
plugin used by the SystemMonitorNotification / Notification / NotificationCommand
element sets the following variables as environment variables:
Table variables
Service variables
These variables can be used when the NotificationCommand calls the notification client - not directly but via a shell script that makes the logical implementation for sending the notification messages.
Table variables
Service variables
Examples
Examples OP5
NotificationInterface
The following is an except from an XML file used to notify a specific System Monitor (OP5 Monitor) via the NotificationInterface:
NotificationCommand
The following is an except from an XML file used to notifying a specific System Monitor (OP5 Monitor) via the NotificationCommand on Windows:
Examples Zabbix
NotificationCommand
The following is an except from an XML file used to notify a specific System Monitor (Zabbix Monitor) and using NotificationCommand
JobScheduler - Job Chains
The following job chains are provided and should be configured accordingly:
sos / notification / CheckHistory
See <scheduler_install>/jobs/JobSchedulerNotificationCheckHistoryJob.xml
- This is the main job that checks the JobScheduler History for errors. Also executes timer checks to find slow performing jobs.
- Order
Check
- configure repeat interval for order run time, e.g. every two minutes.
sos / notification / SystemNotifier
See <scheduler_install>/jobs/JobSchedulerNotificationSystemNotifierJob.xml
- Sends notifications to a specific System Monitor.
- Order
MonitorSystem
- configures a repeat interval for the order run time that is not less than the interval that has been chosen for triggering the job chain
sos/notification/CheckHistory
- configures a repeat interval for the order run time that is not less than the interval that has been chosen for triggering the job chain
sos / notification / CleanupNotifications
See <scheduler_install>/jobs/JobSchedulerNotificationCleanupNotificationsJob.xml
- Removes notifications that have expired.
- Order
Cleanup
- configure start time for order run time, e.g. 24:00
sos / notification / ResetNotifications
See <scheduler_install>/jobs/JobSchedulerNotificationResetNotificationsJob.xml
- Some System Monitors may provide an "acknowledge" operation, that signaling has known problem.
- Should an "acknowledge" operation have been performed for a specific service in the System Monitor then job chain
ResetNotifications
would stop JobScheduler from sending notifications for that service for errors that have already occurred. - Do not configure the order run time for this job chain, as job chain will be triggered by the System Monitor's "acknowledge" operation via add_order XML command.
Examples
Example ResetNotifications <add_order> XML command
The following example shows the XML command sent from a monitoring system to the JobScheduler to call the sos/notification/ResetNotifications
job chain and set the relevant service name as acknowledged.
Key to the above code:
Element | Attribute | Value | Description | |
---|---|---|---|---|
add_order | XML Command to add the new order to the specified job chain on the JobScheduler. | |||
job_chain | sos/notification/ResetNotifications | Job chain path must correspond with the path of the ResetNotifications job chain installed on the JobScheduler. | ||
id | Order identifier. | |||
title | Order title. | |||
param | 3 following parameters must be set: | |||
name | service_name | JobScheduler Monitoring Error | Relevant service name to set all already occured service errors in JobScheduler Interface Monitor as acknowledged. | |
name | system_id | op5 | System identification. Corresponds with | |
name | operation | acknowledge | Fixed value. Operation name to execute the acknowledgement in the JobScheduler Monitoring Interface. |
Example ResetNotifications <add_order> XML command via Perl script for op5 monitor system
This example shows the integration of a Perl script into op5 monitor system that automatically sends the above XML command to the JobScheduler sos/notification/ResetNotifications
job chain.
The "Acknowledgment" on the op5 Monitor side works as follows:
- Contact "acknowledgment" + Event Handler:
- it first of all requires a contact, that receives the Notifications in the same way as the other contacts. However, an event notification for this contact is not received via Mail but an Event Handler, i.e. an XML command will be executed instead of a mail being received. (Please see the next point,
Notification Command
.)
- it first of all requires a contact, that receives the Notifications in the same way as the other contacts. However, an event notification for this contact is not received via Mail but an Event Handler, i.e. an XML command will be executed instead of a mail being received. (Please see the next point,
- The "svc_notify_ack_handle"
Notification Command
:- this command will always be executed for the services that are specified for the contact. This command is executed when the service status changes (for example, by a change from
OK
toCritical
orAcknowledgment
of an Error). - The command executes a
check_acknowledge.pl
script.
- this command will always be executed for the services that are specified for the contact. This command is executed when the service status changes (for example, by a change from
- The
check_acknowledge.pl
Script (see the example below): this script is executed by the command and first of all checks whether the command is a response to anAcknowledgment
: - If the command is not a response to an
Acknowledgment
: then nothing happens - If the command is a response to an
Acknowledgment
: then the script causes the JobScheduler to be contacted and sent am XML query, that instructs the JobScheduler to start a specific job chain (thesos/notification/ResetNotifications
chain)
JobScheduler - Job Chains customization
The default name of the monitor system used in the configuration files and stored in the JobScheduler database is "MonitorSystem".
The default configuration can be changed to allow better customization of the monitoring systems used.
Example customization for the OP5 system monitor:
<scheduler_install>/config/notification/SystemMonitorNotification_MonitorSystem.xml
- rename this file to
SystemMonitorNotification_OP5.xml
- set
system_id
Attribute toOP5
e.g. <SystemMonitorNotification system_id="OP5">
- rename this file to
<scheduler_install>/config/live/sos/notification/SystemNotifier,MonitorSystem.order.xml
- rename this file to
SystemNotifier,OP5.order.xml
- set
system_configuration_file
Attribute toSystemMonitorNotification_OP5.xml
e.g.
<param name="system_configuration_file" value="config/notification/SystemMonitorNotification_OP5.xml"/>
- rename this file to
-
<scheduler_install>/config/live/sos/notification/ResetNotifications,AcknowledgeMonitorSystem.order.xml
- rename this file to
ResetNotifications,AcknowledgeOP5.order.xml
- set
system_id
OP5
e.g.
<param name="system_id" value="OP5"/>
- rename this file to
JobScheduler - Cluster
In case of Cluster Operation please modify the job_chain
element definition for all notification job chain files
- add
distributed="yes"
attribute. e.g:<job_chain distributed="yes" ...
- remove
orders_recoverable="no"
attribute if exists
Following job chain files must be modified in the notification directory
:<scheduler_install>/config/live/sos/notification/
CheckHistory.job_chain.xml
CleanupNotifications.job_chain.xml
ResetNotifications.job_chain.xml
SystemNotifier.job_chain.xml
Use Cases
Workflow Execution takes too long
Initial Situation
A Job Chain is triggered and it could not end, it hang in a step, taking longer than expected.
Problem
Execution time was too long
Handling
A timer for this Job Chain has been set and the System Monitor notified about it. The expiration times for the Job Chains are configured with enough time for processing. This is usually used for cases where the Job Chain could hang in a specific step.
Configuration
SystemMonitorNotification_<MonitorSystem>.xm
l- Configure SystemMonitorNotification / Timer
- Configure SystemMonitorNotification / Notification / NotificationObjects / Timer
- Configure
service_name_on_error
(SystemMonitorNotification / Notification / NotificationMonitor)
System Monitor
- Services in the System Monitor have to be configured and named the same way as in the
service_name_on_error
(SystemMonitorNotification / Notification / NotificationMonitor) above.
- Services in the System Monitor have to be configured and named the same way as in the
SFTP connection refused
Initial Situation
Consider a Job Chain that uses SFTP for transferring files. You have a setback configured in this step of the Job Chain, so that if the connection to the SFTP server fails, this step is retried after a specified time.
Problem
The SFTP server is not available anymore.
Handling
The System Monitor will be notified to the service related to the Job Chain with the message error. However, you don't want to have repeated notifications for a Job Chain when is an external factor, the connection to the SFTP Server, is producing the error.
Configuration
SystemMonitorNotification_<MonitorSystem>.xm
l- Configure SystemMonitorNotification / Notification / NotificationObjects / JobChain for relevant Job chain.
- Configure
service_name_on_error
(SystemMonitorNotification / Notification / NotificationMonitor)
System Monitor
- Services in the System Monitor have to be configured and named the same way as in the
service_name_on_error
(SystemMonitorNotification / Notification / NotificationMonitor) above.
- Services in the System Monitor have to be configured and named the same way as in the
Thresholds
Initial Situation
Consider the situation where a workflow has to be executed successfully a specific number of times before a specific point in time. This means that a specific value has to be monitored in order to determine if this quote was reached.
Handling
A new History service is configured, so that the workflow executions (Job Chains in the JobScheduler vocabulary) send the information that they have been successfully executed to the System Monitor.
Configuration
SystemMonitorNotification_<MonitorSystem>.xm
l- Configure SystemMonitorNotification / Notification / NotificationObjects / JobChain for relevant Job chain
- Configure
service_name_on_success
(SystemMonitorNotification / Notification / NotificationMonitor)
System Monitor
- Services in the System Monitor have to be configured and named the same way as in the
service_name_on_success
(SystemMonitorNotification / Notification / NotificationMonitor) above.
- Services in the System Monitor have to be configured and named the same way as in the
Acknowledgment
Initial Situation
An alert for a Service has been sent to the System Monitor, which has sent a Mail to the Service Desk (Support Team) notifying them about the alert.
Handling
The problem is known to the Service Desk and they "acknowledge" the problem. The acknowledgment will cause the JobScheduler to be notified not to send any more notifications for this Service to the System Monitor until the Service has been recovered.
Configuration
System Monitor
- The JobScheduler is notified about the acknowledgment in the System Monitor by the execution of a script. See sos / notification / ResetNotifications
Recoverable Errors
Initial Situation
You have a setback configured in one of the steps of the Job Chain, so that if the step execution fails, this step is retried after a specified time.
Problem
The step has ended with an error, but recovered after setback
Handling
If the error message has been sent to the System Monitor, in case of error recovery JobScheduler will automatically sent the recovery message on the same service with the same error message and the prefix RECOVERED.
Configuration
SystemMonitorNotification_<MonitorSystem>.xm
l- Configure SystemMonitorNotification / Notification / NotificationObjects / JobChain for relevant Job chain.
- Configure
service_name_on_error
(SystemMonitorNotification / Notification / NotificationMonitor)
System Monitor
- Services in the System Monitor have to be configured and named the same way as in the
service_name_on_error
(SystemMonitorNotification / Notification / NotificationMonitor) above.
- Services in the System Monitor have to be configured and named the same way as in the