Introduction
The JITL MonitoringJob template can be used to perform health checks of JS7 JOC Cockpit, Controller and Agents. Health check results can be forwarded, for example by mail.
- Users can use health status results for integration with their monitoring system.
- SOS offers a 24/7 Monitoring Service to receive health status results for customers using a commercial license and who subscribe to this support option, see JS7 - License.
The JITL MonitoringJob template can be used as a building block in a monitoring solution to:
- repeatedly run the MonitoringJob template using a JS7 - Cycle Instruction,
- forward health check results to a monitoring solution.
- When used with a user's monitoring solution, this can include forwarding health check report files.
- This can include sending e-mails containing a notice or an alert to SOS. Such notices do not include any data related to the user's JS7 environment, they only indicate a notice or alert.
- Alert mails are simplistic like this
The job template makes use of the JS7 - REST Web Service API to retrieve information from the JOC Cockpit.
- The job template authenticates with the JS7 - REST Web Service API with a user account/password and/or with a certificate, for details see JS7 - Authentication.
- For details about configuration items see JS7 - JITL Common Authentication.
FEATURE AVAILABILITY STARTING FROM RELEASE 2.4.1
Usage
When defining the job either:
- invoke the Wizard that is available from the job properties tab in the Configuration view and select the JITL MonitoringJob and relevant arguments from the Wizard
or
- specify the
JITL
job class andcom.sos.jitl.jobs.monitoring.MonitoringJob
Java class name and add required arguments.
Example
Download (upload .json): pdmMonitoring.workflow.json
Using the Example
It is recommended to use the example as a starting point and to adjust the parameterization:
Explanation:
- A JS7 - Cycle Instruction is used in order to repeatedly perform health status checks.
- Users should adjust cycles to their monitoring needs.
- A JS7 - Retry Instruction is used in order to retry execution, for example of the MailJob included in case that e-mail cannot be sent.
- The MonitoringJob is used to perform the health status check.
- The MailJob is used to send notices and alerts by mail. This is an option - users might apply other means to forward notices and alerts.
The Cycle Instruction is configured like this:
Explanation:
- A ticking cycle is used in order to perform health status checks precisely at the given hour and minute.
- The cycle runs in hourly intervals for any days of week.
- The cycle period starts at midnight and lasts 24 hours.
- This example results in 24/7 coverage with the health status check being performed every hour.
The Retry Instruction is configured like this:
Explanation:
- If any of the jobs included in the Retry Instruction fails then execution is resumed starting from the first job.
- Execution is repeated up to 3 times unless successful. The same interval of 1 minute is applied for each retry.
The MonitoringJob makes use of arguments that are explained with chapter Using the Job Wizard for the MonitoringJob.
The MailJob is explained from the JS7 - JITL MailJob article.
Using the Job Wizard for the MonitoringJob
You can use the job wizard like this:
Explanation:
- Add an empty job from the instruction panel.
- Specify a name and a label for the job.
- Select an Agent.
In a next step invoke the job wizard that you find in the upper right corner of the job property editor. The wizard brings up the following popup window:
Explanation:
- From the list of available job templates select the MonitoringJob.
Then hit the "Next" button to make the job wizard display available arguments:
Explanation:
controller_id
: Optionally specifies the identification of the Controller to be checked. By default the current Controller is used.monitor_report_dir
: Specifies the directory in which the job will store health status report files (.json). The directory has to exist prior to running the job and has to be in reach of the Agent that runs the job.- An absolute or a relative path can be specified.
- An expression can be used. The example makes use of
env('JS7_AGENT_DATA') ++ '/monitor'
which translates to use of theJS7_AGENT_DATA
environment variable created by the Agent's start script, see JS7 - Job Environment Variables. This environment variable can for example evaluate to/var/sos-berlin.com/js7/agent
. The ++ operator indicates concatenation and is followed by the name of a sub-directory. In this example the report directory will be/var/sos-berlin.com/js7/agent/monitor
.
monitor_report_max_files
: The number of report files created will be limited to this value. Older report files will be removed when this value is exceeded.from
: Specifies the e-mail address that is used to send mail for notices and alerts. The argument is used by the job to create thesubject
andbody
return variables for use with a later MailJob.max_failed_orders
: The maximum number of failed orders that are considered acceptable for a health status check. If this number is exceeded then theresult
return variable will carry a non-zero value indicating a failed health check.- Select the check box provided with each argument if you want this argument to be added to the arguments of the MonitoringJob template.
When hitting the Submit button the wizard adds the required arguments to the job which should look like this:
Using the Job Wizard for the MailJob
Find instructions from the JS7 - JITL MailJob article.
Use of JS7 - Job Resources to specify mail parameterization is encouraged.
Health Status Check
The health status check performed by the MonitoringJob makes use of the JS7 REST API
- to retrieve such information,
- to write this information to a report file,
- to evaluate if the information indicates a healthy JS7 environment.
Report File
Find a sample report file for download that indicates an alert: monitor.2022-08-17.09-16-44.9Z.alert.json
Health Status Checks
The MonitoringJob performs the following health status checks:
- Controller
- In
volatileStatus
the elementconnectionStates
includesseverity
with a value0
. - In
volatileStatus
the elementcomponentState
includesseverity
with a value0
. - If
role
is present and does not carry the valueSTANDALONE
involatileStatus
then the elementclusterNodeState
has to haveseverity
with a value0
. - If
role
is present and does not contain the valueSTANDALONE
involatileStatus
then the elementisCoupled
has to have the valuetrue
.
- In
- Agents
- In
agentStatus
thehealthState
is present and hasseverity
with a value0
. - In
agentStatus
thestate
is present and hasseverity
with a value0
. - For each enabled
subAgent
thestate
hasseverity
with a value0
.
- In
- JOC Cockpit
- The
connectionState
hasseverity
with a value0
. - The
componentState
hasseverity
with a value0
. - If
clusterNodeState
is present it hasseverity
with a value0
. - If
controllerConnectionStates
is present eachconnectionState
hasseverity
with a value0
.
- The
The number of failed checks is reported by the result
return variable, see next section.
Documentation
The Job Documentation including the full list of arguments can be found under: https://www.sos-berlin.com/doc/JS7-JITL/MonitoringJob.xml
Authentication
The Job makes use of the JS7 - REST Web Service API that is available from JOC Cockpit.
- The job is executed with an Agent and requires a network connection to JOC Cockpit.
- The job has to authenticate with JOC Cockpit, for the related configuration see JS7 - JITL Common Authentication.
Arguments
The MonitoringJob class accepts the following arguments:
Name | Required | Default Value | Purpose | Example |
---|---|---|---|---|
controller_id | no | Optionally specifies the identification of the Controller to be checked. By default the current Controller is used. | controller_prod | |
| yes | Specifies the directory to which the job will store health status report files (.json). This directory has to exist prior to running the job and has to be in reach of the Agent that runs the job.
|
| |
monitor_report_max_files | yes | The number of report files created will be limited to this value. Older report files will be removed when this value is exceeded | 25 | |
from | yes | Specifies the e-mail address that is used to send mail for notices and alerts. The argument is used by the job to create the | js7@example.com | |
max_failed_orders | no | The maximum number of failed orders that are considered acceptable for a health status check. If this number is exceeded then the By default the number of failed orders is not considered for successful/unsuccessful health status checks. | 3 |
Return Variables
The MonitoringJob class returns the following variables for use by subsequent jobs:
Name | Data Type | Purpose | Example |
---|---|---|---|
monitor_report_date | String | The date and time for which the health status check has been performed. The date format is | controller_prod |
monitor_report_file | String | The path to the report file created for the health status check. | /var/sos-berlin.com/js7/agent/monitor/monitor.2022-08-15.17-35-36.5.json |
subject | String | The subject of an e-mail for use with a later MailJob. | JS7 Monitor: Notice from: js7@sos-berlin.com at: 2022-08-15.17-35-36.5 |
body | String | The body of an e-mail for use with a later MailJob, by default the value is the same as for the | JS7 Monitor: Notice from: js7@sos-berlin.com at: 2022-08-15.17-35-36.5 |
result | Number | The number of problems identified during the health status check. A value 0 indicates absence of problems, other values indicate existence of problems. | 0 |