Configuration
JobScheduler - SystemMonitorNotification files
Location: <scheduler_install>/config/notification
File | Description |
---|
SystemMonitorNotification_v1.0.xsd | The XML Schema file defines which values are allowed in your XML files for the JobScheduler monitoring. That means that to configure the JobScheduler objects you want to monitor and the System Monitor you just have to modify your SystemMonitorNotification_<MonitorSystem>.xml files but not the XML schema file. |
SystemMonitorNotification_<MonitorSystem>.xml | Configuration file for each System Monitor. - Specifies the delivery way to System Monitor.
- Specifies notification for error or success conditions
- Specifies notification to measure performance of JobScheduler objects
|
SystemMonitorNotificationTimers.xml
| Configuration file for all System Monitors. - Specifies notification to measure performance of JobScheduler objects
This file is optional and contains the definitions of the SystemMonitorNotification / Timer elements. |
SystemMonitorNotification Elements
The configuration element descriptions are organized into the following major categories:
Element | Element description | Description |
---|
SystemMonitorNotification | Top Level Element | Configuration for notifications to be sent to a system monitor. |
Notification | Once or more inside a SystemMonitorNotification element | Specifies a system monitor notification that includes a command line invocation and the JobScheduler objects. |
Timer | Optional, once or more inside a SystemMonitorNotification element | Performance measurement definition. |
SystemMonitorNotification
SystemMonitorNotification
supports the following attributes:
Note:
<SystemMonitorNotification system_id="OP5">
...
SystemMonitorNotification / Notification
The following elements may be nested inside a Notification
element:
Element | Element description | Description |
---|
NotificationMonitor | Once inside a Notification element | Specifies the System Monitor interface that is being used for messages: either by a Plug-in Interface or by command line invocation |
NotificationObjects | Once inside a Notification element | Specifies the Job Chain and the Timer definitions |
SystemMonitorNotification / Notification / NotificationMonitor
NotificationMonitor
supports the following attributes:
Note:
- attributes
service_name_on_error
and service_name_on_success
- at least one of these attributes must be configured
- both attributes can be configured together
Attribute | Usage | Description |
---|
service_name_on_error | Optional | This setting specifies the service that is configured in the Service Monitor for messages of job runs with errors and for job recovery messages. The service name must match the corresponding setting in the System Monitor. |
service_name_on_success | Optional | This setting specifies the service that is configured in the Service Monitor for receiving informational messages on successful job runs. The service name must match the corresponding setting in the System Monitor |
service_status_on_error | Optional | This setting specifies the service status code for error messages. Default: CRITICAL |
service_status_on_success | Optional | This setting specifies the service status code for success messages Default: OK |
<!-- Example
OP5 NSCA Status:
0 - OK
1 - WARNING
2 - CRITICAL
3 - UNKNOWN -->
...
<!-- Sending occurred errors as CRITICAL (default) -->
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors">
...
<!-- Sending occurred errors as WARNING -->
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors" service_status_on_error="1">
...
One of the following elements must be nested inside a NotificationMonitor
element:
Element | Element description | Description |
---|
NotificationInterface | Optional, once inside of NotificationMonitor element | Plug-in Interface to be executed for System Monitor notification |
NotificationCommand | Optional, once inside of NotificationMonitor element | Command line to be executed for System Monitor notification |
SystemMonitorNotification / Notification / NotificationMonitor / NotificationInterface
NotificationInterface
support the following attributes:
Attribute | Usage | Description |
---|
monitor_host | Required | This setting specifies the host name or ip address of System Monitor host. |
monitor_port | Required | This setting specifies the TCP port that the System Monitor would listen to. |
monitor_password | Optional | This setting specifies the password configured in the ncsa.cfg file used by NSCA. |
monitor_connection_timeout | Optional | This setting specifies the connection timeout in ms. Default: 5000 |
monitor_response_timeout | Optional | This setting specifies the NSCA response timeout in ms. |
monitor_encryption | Optional | This setting specifies that the communication with the System Monitor is encrypted. By default no encryption is used. NONE - no encryptionXOR - XOR encryptionTRIPLE_DES - use of triple des algorithm for encryption
|
service_host | Required | This setting specifies the name of the host that executes the passive check. The name must match the corresponding setting in the System Monitor. |
plugin | Optional | Default: com.sos.scheduler.notification.plugins.notifier.SystemNotifierSendNscaPlugin |
...
<NotificationInterface monitor_host="monitor_host" monitor_port="5667" monitor_encryption="XOR" service_host="service_host"><![CDATA[
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step =%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT%
]]></NotificationInterface>
...
SystemMonitorNotification / Notification / NotificationMonitor / NotificationCommand
NotificationCommand
support the following attributes:
Attribute | Usage | Description |
---|
plugin | Optional | Default: com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin |
...
<NotificationCommand><![CDATA[
echo scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step =%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT% > D://errors.txt
]]></NotificationCommand>
...
SystemMonitorNotification / Notification / NotificationObjects
One of the following elements must be nested inside a NotificationObjects
element:
Element | Element description | Description |
---|
JobChain | Optional, once or more inside of NotificationObjects element | Restricts notifications for job chains |
Timer | Optional, once or more inside of NotificationObjects element | Restricts notifications for performance checks (Timer) |
<SystemMonitorNotification system_id="OP5">
<Notification>
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors">
...
</NotificationMonitor>
<NotificationObjects>
<!-- Send the job chain error, occurrent in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Errors" service. -->
<JobChain name="test/my_jobchain" />
</NotificationObjects>
</Notification>
</SystemMonitorNotification>
SystemMonitorNotification / Notification / NotificationObjects / JobChain
JobChain
supports the following attributes:
Attribute | Usage | Description |
---|
notifications | Optional Integer | Specifies the number of notifications that are sent to a System Monitor. Default: 1 |
scheduler_id | Optional | Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database. Regular expression can be used. |
name | Optional | Job chain name including possible folder names. Regular expression can be used. |
step_from | Optional | Restricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
step_to | Optional | Restricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
excluded_steps | Optional | Specifies the steps which will be excluded from the analyzing (separated by semicolon) |
...
<JobChain notifications="2" name="test/my_jobchain"/>
...
<JobChain scheduler_id="scheduler_4444" />
...
<JobChain scheduler_id="scheduler_4444" name="^(test/my)" />
...
<JobChain name="test/my_jobchain" step_from="200"/>
...
<JobChain name="test/my_jobchain" step_to="500"/>
...
<JobChain name="test/my_jobchain" step_from="300" step_to="300"/>
...
<JobChain name="test/my_jobchain" excluded_steps="200;300"/>
...
SystemMonitorNotification / Notification / NotificationObjects / Timer
Timer supports the following attributes:
Attribute | Usage | Description |
---|
notifications | Optional Integer | Specifies the number of notifications that are sent to a System Monitor. Default: 1 |
name | Optional | Corresponds with Timer name setting defined in the SystemMonitorNotification / Timer element |
notify_on_error | Optional Boolean | Send timer check notification when the configured job chain contains the error notifications. Default: false |
<SystemMonitorNotification system_id="OP5">
<Notification>
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Error">
...
</NotificationMonitor>
<NotificationObjects>
<!--
Send the job chain error, occurring in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Errors" service.
-->
<JobChain name="test/my_jobchain" />
</NotificationObjects>
</Notification>
<Notification>
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Performance">
...
</NotificationMonitor>
<NotificationObjects>
<!--
Sends the performance check error, occurring in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Performance" service.
Sends the performance check error to the "JobScheduler Monitoring Performance" service.
Will be ignored when the "test/my_jobchain" has the job chain error (default notify_on_error = false).
-->
<Timer name="my_timer" />
</NotificationObjects>
</Notification>
<Timer name="my_timer">
<JobChain name="test/my_jobchain" />
</Timer>
</SystemMonitorNotification>
SystemMonitorNotification / Timer
The following elements must be nested inside a Timer
element:
Element | Element description | Description |
---|
JobChain | Once or more inside of Timer element | Restricts notifications for job chains |
Minimum | Optional or once inside of Timer element | Minimum required time required for job or job chain execution. Allows script code to be executed that returns the minimum execution time required in seconds. |
Maximum | Optional or once inside of Timer element | Maximum allowed time required for job or job chain execution. Allows script code to be executed that returns the maximum execution time required in seconds. |
<SystemMonitorNotification system_id="OP5">
...
<Timer name="my_timer_1">
<JobChain name="test/my_jobchain_1" />
<Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum>
</Timer>
<Timer name="my_timer_2">
<JobChain name="test/my_jobchain_2" />
<JobChain name="test/my_jobchain_3" />
<Minimum><Script language="javascript"><![CDATA[500]]></Script></Minimum>
<Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum>
</Timer>
</SystemMonitorNotification>
Timer
support the following attributes:
Attribute | Usage | Description |
---|
name | Required | Corresponds to Timer used in the SystemMonitorNotification / Notification / NotificationObjects / Timer element. The name must be unique across all timers definitions. |
...
<Timer name="my_timer">
...
SystemMonitorNotification / Timer / JobChain
JobChain
support the following attributes:
Attribute | Usage | Description |
---|
scheduler_id | Optional | Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database. Regular expression can be used. |
name | Optional | Job chain name including possible folder names. Regular expression can be used. |
step_from | Optional | Restricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
step_to | Optional | Restricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes. |
...
<JobChain scheduler_id="scheduler_4444" />
...
<JobChain scheduler_id="scheduler_4444" name="^(test/my)" />
...
<JobChain name="test/my_jobchain" step_from="200"/>
...
<JobChain name="test/my_jobchain" step_to="500"/>
...
<JobChain name="test/my_jobchain" step_from="300" step_to="300"/>
...
SystemMonitorNotification / Timer / Minimum
The following elements must be nested inside a Minimum
element:
Element | Element description | Description |
---|
Script | Once inside of Minimum element | Script code in one of the supported languages |
...
<Timer name="my_timer">
...
<Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum>
</Timer>
...
SystemMonitorNotification / Timer / Maximum
The following elements must be nested inside a Maximum
element:
Element | Element description | Description |
---|
Script | Once inside of Maximum element | Script code in one of the supported languages |
...
<Timer name="my_timer">
...
<Minimum><Script language="javascript"><![CDATA[1000]]></Script></Minimum>
</Timer>
...
SystemMonitorNotification / Timer / Minimum|Maximum / Script
Script
supports the following attributes:
Attribute | Usage | Description |
---|
language | Required | Script language name Supported languages: |
The Script element can contain:
- a fixed value
- a calculation based on the job/order parameters
Fixed value
A fixed value is the time allowed in seconds for the specific Minimum
or Maximum
definition
...
<Script language="javascript"><![CDATA[1000]]></Script>
...
Calculation
The calculation is to result in the time in seconds for the specific Minimum
or Maximum
definition.
This example calculates the execution time depending on the %file_size%
parameter that was set by a specific job (see the example below)´.
...
<Script language="javascript"><![CDATA[
function my_calculate(){
var fileSize = new java.lang.Double(%file_size%);
var timerExpiryFactor = 0.0025;
var timerExpiryTolerance = timerExpiryFactor*0.1;
var timerExpiry = new java.lang.Double(timerExpiryFactor+timerExpiryTolerance);
timerExpiry = timerExpiry*fileSize;
return timerExpiry;
}
my_calculate();
]]></Script>
...
This example job calculates and creates a new order parameter file_size
.
To store the parameters into database (table SCHEDULER_MON_RESULTS
) :
- set the
scheduler_notification_result_parameters
parameter (see job documentation jobs/JobSchedulerNotificationStoreResultsJob.xml
) - set the
com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass
as monitor
<?xml version="1.0" encoding="ISO-8859-1"?>
<job title="Sample Job with Store Result Monitor" order="yes" stop_on_error="no" tasks="1">
<params>
<!--
set the scheduler_notification_result_parameters parameter
-->
<param name="scheduler_notification_result_parameters" value="file_size"/>
</params>
<!--
calculate and create the new order parameter if necessary
-->
<script language="javascript"><![CDATA[
function spooler_process(){
var order = spooler_task.order;
var params = spooler.create_variable_set();
params.merge(spooler_task.params);
params.merge(order.params);
// parameter scheduler_file_path was set in the previous job chain step
var file = new java.io.File(params.value("scheduler_file_path"));
var fileSize = file.length()/1024;
order.params.set_var("file_size",fileSize.toString());
return true;
}]]>
</script>
<!--
set the com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass as a monitor
-->
<monitor name="notification_monitor" ordering="1">
<script java_class="com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass" language="java"/>
</monitor>
<run_time />
</job>
Message
Usage
The Message can be configured on the following parent nodes as a CDATA element :
SystemMonitorNotification / Notification / NotificationCommand
SystemMonitorNotification / Notification / NotificationInterface
The Message can contain:
Example: <![CDATA[ scheduler id = %MON_N_SCHEDULER_ID% ]]>
Variables
All variables must be defined by using of the %<variable name>%
syntax.
The order of the substitution the variables values is:
- Table variables.
- Service variables.
- OS environment variables.
Table variables
Variables: table SCHEDULER_MON_NOTIFICATIONS
Table of the history of steps of processed orders.
Name | Description |
---|
%MON_N_ID% | Unique notification id |
%MON_N_SCHEDULER_ID% | Id of the JobScheduler |
%MON_N_TASK_ID% | Id of the JobScheduler task |
%MON_N_STEP% | Consecutive number of the order step |
%MON_N_ORDER_HISTORY_ID% | Id of the JobScheduler order |
%MON_N_JOB_CHAIN_NAME% | Name of the job chain of the order |
%MON_N_JOB_CHAIN_TITLE% | Title of the job chain of the order |
%MON_N_ORDER_ID% | Unique (within the job chain) id of the order |
%MON_N_ORDER_TITLE% | Title of the order |
%MON_N_ORDER_START_TIME% | Timestamp of the start of the order |
%MON_N_ORDER_END_TIME% | Timestamp of the end of the order |
%MON_N_ORDER_TIME_ELAPSED% | The time or difference in seconds between a beginning time and an ending time of the order |
%MON_N_ORDER_STEP_STATE% | State of the order inside the job chain |
%MON_N_ORDER_STEP_START_TIME% | Timestamp of the start of the order step |
%MON_N_ORDER_STEP_END_TIME% | Timestamp of the end of the order step |
%MON_N_ORDER_STEP_TIME_ELAPSED% | The time or difference in seconds between a beginning time and an ending time of the order step |
%MON_N_JOB_NAME% | Name of the job |
%MON_N_JOB_TITLE% | Title of the job |
%MON_N_TASK_START_TIME% | Timestamp of the job task start |
%MON_N_TASK_END_TIME% | Timestamp of the job task end |
%MON_N_TASK_TIME_ELAPSED% | The time or difference in seconds between a beginning time and an ending time of the job task |
%MON_N_RECOVERED% | 0 = dependent of the %MON_N_ERROR% - ok or error was not recovered,
1 = error was recovered
|
%MON_N_ERROR% | 0 = ok
1 = error
|
%MON_N_ERROR_CODE% | Exception-code of the job error |
%MON_N_ERROR_TEXT% | Exception message of the job (that processed the order) |
%MON_N_CREATED% | Timestamp of the notification initial record |
%MON_N_MODIFIED% | Timestamp of the latest changes to this notification record |
scheduler id = %MON_N_SCHEDULER_ID%, history id = %MON_N_ORDER_HISTORY_ID%, job_chain = %MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), error = %MON_N_ERROR_TEXT%
Variables: table SCHEDULER_MON_SYSNOTIFICATIONS
Table of the history of notifications sent to a system monitor.
Name | Description |
---|
%MON_SN_ID% | Unique system notification id |
%MON_SN_NOTIFICATION_ID% | Reference to the SCHEDULER_MON_NOTIFICATIONS .ID table |
%MON_SN_CHECK_ID% | Reference to the SCHEDULER_MON_CHECKS .ID table |
%MON_SN_SYSTEM_ID% | Reference to the element attribute SystemMonitorNotification / @system_id
defined in the XML configuration file |
%MON_SN_SERVICE_NAME% | Reference to one of both element attributes SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success
defined in the XML configuration file |
%MON_SN_STEP_FROM% | Reference to the element attribute SystemMonitorNotification / Notification / NotificationObjects / JobChain / @step_from
defined in the XML configuration file |
%MON_SN_STEP_TO% | Reference to the element attribute SystemMonitorNotification / Notification / NotificationObjects / JobChain / @step_to
defined in the XML configuration file |
%MON_SN_STEP_FROM_START_TIME% | Timestamp for the start of the order step |
%MON_SN_STEP_TO_END_TIME% | Timestamp for the end of the order step |
%MON_SN_STEP_TIME_ELAPSED% | The elapsed time or the difference in seconds between the start and end times of the order step |
%MON_SN_NOTIFICATIONS% | Number of notifications that already sent to a System Monitor |
%MON_SN_MAX_NOTIFICATIONS% | Reference to element attribute SystemMonitorNotification / Notification / NotificationObjects / JobChain / @notifications
defined in the XML configuration file |
%MON_SN_ACKNOWLEDGED% | 0 = not acknowledged
1 = acknowledged
|
%MON_SN_RECOVERED% | 0 = recovery not sent
1 = recovery sent
|
%MON_SN_SUCCESS% | 0 = success not sent
1 = success sent
|
%MON_SN_CREATED% | Timestamp of the initial system notification record |
%MON_SN_MODIFIED% | Timestamp of the latest changes to this system notification record |
step from = %MON_SN_STEP_FROM%, step to = %MON_SN_STEP_TO%, notification = %MON_SN_NOTIFICATIONS% (of %MON_SN_MAX_NOTIFICATIONS%)
Variables: table SCHEDULER_MON_CHECKS
Table of the history of executed checks (Timer)
Name | Description |
---|
%MON_C_ID% | Unique check id |
%MON_C_NOTIFICATION_ID% | Reference to table SCHEDULER_MON_NOTIFICATIONS .ID |
%MON_C_NAME% | Reference to element attribute SystemMonitorNotification / Timer / @name
defined in the XML configuration file |
%MON_C_STEP_FROM% | Reference to element attribute SystemMonitorNotification / Timer / JobChain / @step_from
defined in the XML configuration file |
%MON_C_STEP_TO% | Reference to element attribute SystemMonitorNotification / Timer / JobChain / @step_to
defined in the XML configuration file |
%MON_C_STEP_FROM_START_TIME% | Timestamp of the start of the order step |
%MON_C_STEP_TO_END_TIME% | Timestamp of the end of the order step |
%MON_C_STEP_TIME_ELAPSED% | The time or difference in seconds between a beginning time and an ending time of the order step |
%MON_C_CHECK_TEXT% | Message of the check |
%MON_C_CREATED% | Timestamp of the check initial record |
%MON_C_MODIFIED% | Timestamp of the latest changes to this check record |
timer name = %MON_C_NAME%, text = %MON_C_CHECK_TEXT%
Service variables
Variables
Name | Description |
---|
%SERVICE_NAME% | Current service name. One of both element attributes: SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success
|
%SERVICE_STATUS% | Current service status. One of both element attributes or default: SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_error SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_success
- default
CRITICAL error - default
OK success
|
%SERVICE_MESSAGE_PREFIX% | Message prefix ERROR errorRECOVERED error recoveryTIMER performance check
|
service name = %SERVICE_NAME%
OS environment variables
All existing system variables can be defined by message using the syntax %<variable name>%
(Windows/Unix)
.
Examples
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step=%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT%
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), steps(%MON_SN_STEP_FROM% to %MON_SN_STEP_TO%), order time elapsed = %MON_N_ORDER_TIME_ELAPSED%s
name = %MON_C_NAME%, scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), steps(%MON_C_STEP_FROM% to %MON_C_STEP_TO%), check = %MON_C_CHECK_TEXT%
Notification environment variables
The default com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin
plugin used by the SystemMonitorNotification / Notification / NotificationCommand
element sets the following variables as environment variables:
Table variables
Service variables
These variables can be used when the NotificationCommand calls the notification client - not directly but via a shell script that makes the logical implementation for sending the notification messages.
Table variables
Variables
All table variables (see Table variables
explanation) are set as environment variables with the prefix:
e.g.:
SCHEDULER_MON_TABLE_MON_N_ID
SCHEDULER_MON_TABLE_MON_N_SCHEDULER_ID
...
Service variables
Variables
Name | Description |
---|
SCHEDULER_MON_SERVICE_NAME
| Current service name. One of both element attributes: - SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
- SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success
|
SCHEDULER_MON_SERVICE_STATUS
| Current service status. One of both element attributes or default: - SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_error
- SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_success
default CRITICAL error- default OK success
|
SCHEDULER_MON_SERVICE_MESSAGE_PREFIX
| ERROR error RECOVERED error recovery TIMER performance check
|
SCHEDULER_MON_SERVICE_COMMAND
| Content of the SystemMonitorNotification / Notification / NotificationCommand after substitution |
1) configured command in the SystemMonitorNotification_<MonitorSystem>.xml file
<NotificationCommand><![CDATA[/tmp/command.sh]</NotificationCommand>
2) content of the /tmp/command.sh file
#! /bin/sh
# Note: "> /tmp/command_output.txt" is used to simulate the starting of the notification client
#
echo $SCHEDULER_MON_SERVICE_NAME:$SCHEDULER_MON_SERVICE_STATUS:$SCHEDULER_MON_SERVICE_MESSAGE_PREFIX history id = $SCHEDULER_MON_TABLE_MON_N_ORDER_HISTORY_ID > /tmp/command_output.txt
1) configured command in the SystemMonitorNotification_<MonitorSystem>.xml file
<NotificationCommand><![CDATA[C:/Temp/command.cmd]</NotificationCommand>
2) content of the C:/Temp/command.cmd file
rem Note: "> C:/Temp/command_output.txt" is used to simulate the starting of the notification client
rem
echo %SCHEDULER_MON_SERVICE_NAME%:%SCHEDULER_MON_SERVICE_STATUS%:%SCHEDULER_MON_SERVICE_MESSAGE_PREFIX% history id = %SCHEDULER_MON_TABLE_MON_N_ORDER_HISTORY_ID% > C:/Temp/command_output.txt
Examples
Examples OP5
NotificationInterface
The following is an except from an XML file used to notify a specific System Monitor (OP5 Monitor) via the NotificationInterface:
...
<!--
monitor_host The hostname or ip address of System Monitor host
monitor_port The TCP port that the System Monitor would listen to
monitor_encryption Encryption algorithm
service_host The host that executes the passive check. The name must match the corresponding setting in the System Monitor
%MON_N_SCHEDULER_ID% See explanation "Table variables"
...
-->
<NotificationInterface monitor_host="monitor_host" monitor_port="5667" monitor_encryption="XOR" service_host="service_host"><![CDATA[
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step =%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT%
]]></NotificationInterface>
...
NotificationCommand
The following is an except from an XML file used to notifying a specific System Monitor (OP5 Monitor) via the NotificationCommand on Windows:
...
<!--
service_host The host that executes the passive check. The name must match the corresponding setting in the System Monitor.
monitor_host The hostname or ip address of System Monitor host.
%SERVICE_NAME% See explanation "Service variables"
%SERVICE_STATUS% See explanation "Service variables"
%SERVICE_MESSAGE_PREFIX% See explanation "Service variables"
%MON_N_SCHEDULER_ID% See explanation "Table variables"
...
NotificationCommand after substitution (error case):
<![CDATA[echo service_host:JobScheduler Monitoring Errors:2:ERROR scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error=error occurred | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
NotificationCommand after substitution (recovery case):
<![CDATA[echo service_host:JobScheduler Monitoring Errors:0:RECOVERED scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error=error occurred | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
NotificationCommand after substitution (success case):
<![CDATA[echo service_host:JobScheduler Monitoring Success:0:scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error= | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
-->
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors" service_name_on_success="JobScheduler Monitoring Success">
<NotificationCommand><![CDATA[echo service_host:%SERVICE_NAME%:%SERVICE_STATUS%:%SERVICE_MESSAGE_PREFIX%scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step=%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT% | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
</NotificationCommand>
</NotificationMonitor>
...
Examples Zabbix
NotificationCommand
The following is an except from an XML file used to notify a specific System Monitor (Zabbix Monitor) and using NotificationCommand
...
<!--
zabbix_sender Zabbix sender installed on the JobScheduler host
localhost Hostname of the zabbix server
Zabbix_server JobScheduler Agent name(host name) that registred on Zabbix
samples.job1 Item key of zabbix (replace "/" to "." of JOB_NAME
%MON_N_ERROR_TEXT% See explanation "Table variables"
-->
<NotificationCommand>
<![CDATA[zabbix_sender -z localhost -s zabbix_server -k samples.job1 -o %MON_N_ERROR_TEXT%]]>
</NotificationCommand>
...
JobScheduler - Job Chains
The following job chains are provided and should be configured accordingly:
sos / notification / CheckHistory
See <scheduler_install>/jobs/JobSchedulerNotificationCheckHistoryJob.xml
- This is the main job that checks the JobScheduler History for errors. Also executes timer checks to find slow performing jobs.
- Order
Check
- configure repeat interval for order run time, e.g. every two minutes.
sos / notification / SystemNotifier
See <scheduler_install>/jobs/JobSchedulerNotificationSystemNotifierJob.xml
- Sends notifications to a specific System Monitor.
- Order
MonitorSystem
- configures a repeat interval for the order run time that is not less than the interval that has been chosen for triggering the job chain
sos/notification/CheckHistory
sos / notification / CleanupNotifications
See <scheduler_install>/jobs/JobSchedulerNotificationCleanupNotificationsJob.xml
- Removes notifications that have expired.
- Order
Cleanup
- configure start time for order run time, e.g. 24:00
sos / notification / ResetNotifications
See <scheduler_install>/jobs/JobSchedulerNotificationResetNotificationsJob.xml
- Some System Monitors may provide an "acknowledge" operation, that signals that a problem is known and that no further notifications should be sent by JobScheduler.
- Should an "acknowledge" operation have been performed for a specific service in the System Monitor then this job chain would stop JobScheduler from sending notifications to that service for errors that have already occurred.
- Order
AcknowledgeMonitorSystem
- do not configure the order run time as this job chain will be triggered by the System Monitor operation "acknowledge".
Examples
Example ResetNotifications OP5 add_order
<add_order job_chain ="sos/notification/ResetNotifications"
id ="OP5 MyServiceName acknowledegment"
title ="OP5 MyServiceName acknowledegment">
<params>
<param name="service_name" value="MyServiceName" />
<param name="system_id" value="OP5"/>
<param name="operation" value="acknowledge" />
</params>
</add_order>
Example ResetNotifications OP5 add_order perl
#!/usr/bin/perl -w
use strict;
use Net::HTTP;
use Getopt::Long;
use vars qw($opt_H $opt_f $opt_s $opt_p $opt_t $opt_h);
use vars qw(%ERRORS &support);
my $host;
my $type;
my $service;
my $port;
my $timeout = 30;
our %ERROR;
%ERRORS = (
'OK' => 0,
'CRITICAL' => 2,
'ERROR' => 2,
'UNKNOWN' => 9,
'WARNING' => 1,
);
sub print_help ();
sub print_usage ();
Getopt::Long::Configure('bundling');
GetOptions
("h" => \$opt_h, "help" => \$opt_h,
"H=s" => \$opt_H, "hostname=s" => \$opt_H,
"f=s" => \$opt_f,
"s=s" => \$opt_s, "service=s" => \$opt_s,
"t=i" => \$opt_t, "timeout=i" => \$opt_t,
"p=i" => \$opt_p, "port=i" => \$opt_p);
if ($opt_h) {print_help(); exit 0;}
if ($opt_H ) {
if ( $opt_H =~ /([-.A-Za-z0-9]+)/ ) {
$host = $opt_H;
}
($host) || print("Invalid host: $opt_H\n");
}
else {
print("Host name/address not specified\n");
}
if ($opt_p ) {
if ($opt_p =~ /([0-9]+)/) {
$port = $1 if ($opt_p =~ /([0-9]+)/);
}
($port < 0 || $port > 65535) && print("Invalid Port: $opt_p\n");
}
else {
print("Port not specified\n");
}
if ($opt_t) { $timeout = $opt_t; }
if( !$host || !$port ) {
print_usage();
exit 1;
}
$opt_s=~ s/ /%20/g;
print("service name:$opt_s\n");
# $job_scheduler_request output example.
# MyServiceName is the example of the service name $opt_s.
# <add_order job_chain ="sos/notification/ResetNotifications"
# id ="OP5 MyServiceName acknowledegment"
# title ="OP5 MyServiceName acknowledegment">
# <params>
# <param name="service_name" value="MyServiceName" />
# <param name="system_id" value="OP5"/>
# <param name="operation" value="acknowledge" />
# </params>
# </add_order>
my $job_scheduler_request = "%3Cadd_order%20job_chain=%22sos/notification/ResetNotifications%22%20id=%22OP5%20".$opt
_s."%20acknowledegment%22%20title=%22OP5%20".$opt_s."%20acknwoledgement%22%3E%3Cparams%3E%3Cparam%20name=%22syst
em_id%22%20value=%22op5%22/%3E%3Cparam%20name=%22service_name%22%20value=%22".$opt_s."%22/%3E%3Cparam%20name=%22
operation%22%20value=%22acknowledge%22/%3E%3C/params%3E%3C/add_order%3E";
if($opt_f=~m/ACKNOWLEDGEMENT/){
my $jobscheduler_answer_xml = get_answer($job_scheduler_request);
my $jobscheduler_state_element = get_state_elem($jobscheduler_answer_xml);
my $jobscheduler_state = get_attribute_value("state",$jobscheduler_state_element);
_report('OK', "OK: Service Name is " . $opt_s . " and notification type is ". $opt_f ." and JS request is ".
$job_scheduler_request. "\n");
}
else{print("Sorry, but this is not an acknowledgement\n");}
sub get_attribute_value {
my ($attr_name, $elem_xml) = @_;
$elem_xml =~ s/.*$attr_name\s*=\s*\"(.*?)\".*/$1/s;
return $elem_xml;
}
sub get_state_elem {
my $xml = shift;
$xml =~ s/.*<spooler.*?>\s*<answer.*?>\s*(<state.*?>).*/$1/s;
return $xml;
}
sub get_answer {
my $request = shift;
my $socket = Net::HTTP->new(Host => $host, PeerPort => $port, Timeout => $timeout);
my $xmlAnswer = "";
if ($socket) {
$socket->write_request(GET => $request);
my($code, $mess, %h) = $socket->read_response_headers;
if ($code == 200) {
while (1) {
my $buf;
my $n = $socket->read_entity_body($buf, 1024);
last unless $n;
$xmlAnswer .= $buf;
}
}
else {
_report('ERROR',"Connection to JobScheduler " . $host . ":" . $port . " failed: (" . $code . ") " .
$mess . "\n");
}
}
else {
_report('CRITICAL',"Connection to JobScheduler " . $host . ":" . $port . " failed: " . $@ . "\n");
}
return $xmlAnswer;
}
sub print_help () {
print $0. "\n";
print "Copyright (c) 2012 SOS GmbH, info\@sos-berlin.com
This script tries to connect to given Job Scheduler
";
print_usage();
print "
-H, --hostname=HOST
Name or IP address of host to check
-p, --port=INTEGER
Port at host to check
-t, --timeout=INTEGER
Timeout for HTTP connetion
-h, --help
This help
";
}
sub print_usage () {
print "Usage: $0 -H <host> -p <port> [-t <timeout>]\n";
}
sub _report {
print $_[1];
if (defined($ERRORS{$_[0]})) { exit $ERRORS{$_[0]}; }
else { exit 0; }
}
JobScheduler - Job Chains customization
The default name of the monitor system used in the configuration files and stored in the JobScheduler database is "MonitorSystem".
The default configuration can be changed to allow better customization of the monitoring systems used.
Example customization for the OP5 system monitor:
<scheduler_install>/config/notification/SystemMonitorNotification_MonitorSystem.xml
- rename this file to
SystemMonitorNotification_OP5.xml
- set
system_id
Attribute to OP5
e.g. <SystemMonitorNotification system_id="OP5">
<scheduler_install>/config/live/sos/notification/SystemNotifier,MonitorSystem.order.xml
- rename this file to
SystemNotifier,OP5.order.xml
- set
system_configuration_file
Attribute to SystemMonitorNotification_OP5.xml
e.g.
<param name="system_configuration_file" value="config/notification/SystemMonitorNotification_OP5.xml"/>
-
<scheduler_install>/config/live/sos/notification/ResetNotifications,AcknowledgeMonitorSystem.order.xml
- rename this file to
ResetNotifications,AcknowledgeOP5.order.xml
- set
system_id
Attribute to OP5
e.g.
<param name="system_id" value="OP5"/>
WORK IN PROGRESS
Use Cases
Recoverable Errors
Initial Situation: A Job Chain is triggered by directory monitoring - i.e. the Job Chain starts when a certain file arrives in a monitored folder.
Problem: The Job Chain has ended with an error.
Handling: The System Monitor will be notified with the error message via the service specified for the Job Chain. If the Job Chain is then restarted by the arrival of a new file end and ends without an error, this does not mean that the original error has been recovered, since the second run has involved the processing of a different file. Instead, the error message at the System Monitor should remain unchanged until the original file has been re-added to the monitored directory and the Job Chain has ended without an error.
Configuration:
- XML
CheckConfigurationHistory.xml
: Indicates the ID of the JobScheduler and the name of the Job Chain you want to monitor. - XML
SystemMonitorNotification.xml
: Specifies the name of the Service (in the System Monitor) and specifies that it is about a service_name_on_error
since you want to have the control when the Job Chain ends in an error. - System Monitor: Services in the System Monitor have to be configured and named the same way as in the
SystemMonitorNotification.xml
XML file above.
Initial Situation: A Job Chain is triggered and it could not end, it hang in a step, taking longer than expected.
Problem: Execution time was too long
Handling: A timer for this Job Chain has been set and the System Monitor notified about it. The expiration times for the Job Chains are configured with enough time for processing. This is usually used for cases where the Job Chain could hang in a specific step.
Configuration:
- XML
CheckConfigurationHistory.xml
: As in the example above - indicates the ID of the JobScheduler and the name of the Job Chain you want to monitor. In addition, the timer for this specific job chain and the function for calculating the expiration time for the timer should be specified. - XML
SystemMonitorNotification.xml
: As in the example above - specifies the name of the Service (in the System Monitor) and that it is about a service_name_on_error
since you want to have the control if the Job Chain ends with an error. It is essential for this particular case that the number of times the timer should notify your System Monitor about the expiration of a timer should be specified. - System Monitor: As in the example above - Services in the System Monitor have to be configured and named the same way as in the
SystemMonitorNotification.xml
file above.
SFTP connection refused
Initial Situation: Consider a Job Chain that uses SFTP for transferring files. You have a setback configured in this step of the Job Chain, so that if the connection to the SFTP server fails, this step is retried after a specified time.
Problem: The SFTP server is not available anymore.
Handling: The System Monitor will be notified to the service related to the Job Chain with the message error. However, you don't want to have repeated notifications for a Job Chain when is an external factor, the connection to the SFTP Server, is producing the error.
Configuration:
- XML
CheckConfigurationHistory.xml
: As in the example above - indicates the ID of the JobScheduler and the name of the Job Chain you want to monitor. - XML
SystemMonitorNotification.xml
: As in the example above - specifies the name of the Service (in the System Monitor) and that it is about a service_name_on_error
as you want to have the control if the Job Chain ends in error. Note that it is very important in this case that the number of times this Job Chain should notify your System Monitor about the error connecting to the SFTP Server is specified. You can use step_from
and step_to
for this in order to reduce the number of notifications for this specific step. - System Monitor: As in the example above - Services in the System Monitor have to be configured and named the same way as in the
SystemMonitorNotification.xml
file above.
Thresholds
Initial Situation: Consider the situation where a workflow has to be executed successfully a specific number of times before a specific point in time. This means that a specific value has to be monitored in order to determine if this quote was reached.
Handling: A new History service is configured, so that the workflow executions (Job Chains in the JobScheduler vocabulary) send the information that they have been successfully executed to the System Monitor.
Configuration:
- XML
CheckConfigurationHistory.xml
: As in the example above - indicates the ID of the JobScheduler and the name of the Job Chain you want to monitor. - XML
SystemMonitorNotification.xml
: Specifies the name of the Service (in the System Monitor) but note that here it is about a service_name_on_success
since you want to have the control when the Job Chain ends in an success, and not only when it ends on error. - System Monitor: As in the example above - Services in the System Monitor have to be configured and named the same way as in the
SystemMonitorNotification.xml
file above.
Acknowledgment
Initial Situation: An alert for a Service has been sent to the System Monitor, which has sent a Mail to the Service Desk (Support Team) notifying them about the alert.
Handling: The problem is known to the Service Desk and they "acknowledge" the problem. The acknowledgment will cause the JobScheduler to be notified not to send any more notifications for this Service to the System Monitor until the Service has been recovered.
Configuration:
- System Monitor: The JobScheduler is notified about the acknowledgment in the System Monitor by the execution of a script. This has parallels to other notifications such as sending a mail but in this case the script adds an order to the JobChain
ResetNotifications
described above.