Page History
...
controller_id
: Optionally specifies the identification of the Controller to be checked. By default the current Controller is used.monitor_report_dir
: Specifies the directory to which the job will store health status report files (.json). The directory has to exist prior to running the job and has to be in reach of the Agent that runs the job.- An absolute or relative path can be specified.
- An expression can be used. The example makes use of
env('JS7_AGENT_DATA') ++ '/monitor'
which translates to use of theJS7_AGENT_DATA
environment variable created by the Agent's start script, see JS7 - Job Environment Variables. The environment variable can for example evaluate to/var/sos-berlin.com/js7/agent
. The ++ operator indicates concatenation and is followed by the name of a sub-directory. In this example the report directory will be/var/sos-berlin.com/js7/agent/monitor
.
monitor_report_max_files
: The number of report files created will be limited to this value. Older report files will be removed when this value is exceeded.from
: Specifies the e-mail address that is used to send mail for notices and alerts. The argument is used by the job to create thesubject
andbody
return variables for use with a later MailJob.max_failed_orders
: The maximum number of failed orders that are considered acceptable for a health status check. If this number is exceeded then theresult
return variable will carry a non-zero value indicating a failed health check.- Select the check box provided with each argument if you want this argument to be added to the arguments of the MonitoringJob template.
...
Find a sample report file for download that indicates an alert: monitor.2022-08-17.09-16-44.9Z.alert.json
Code Block | ||||
---|---|---|---|---|
| ||||
{
"controllerStatus" : {
"active" : {
"id" : 3,
"surveyDate" : "2022-08-17T08:57:43.000+00:00",
"controllerId" : "testsuite",
"title" : "SECONDARY CONTROLLER",
"host" : "controller-2-0-secondary",
"url" : "https://controller-2-0-secondary:4443",
"clusterUrl" : "https://controller-2-0-secondary:4443",
"role" : "BACKUP",
"isCoupled" : false,
"startedAt" : "2022-08-16T18:09:27.000+00:00",
"version" : "2.5.0-SNAPSHOT+fd0eb39",
"javaVersion" : "17.0.4+8-alpine-r0",
"os" : {
"name" : "Linux",
"architecture" : "amd64",
"distribution" : "3.10.0-957.1.3.el7.x86_64"
},
"securityLevel" : "MEDIUM"
},
"volatileStatus" : {
"id" : 2,
"surveyDate" : "2022-08-17T09:16:45.064+00:00",
"controllerId" : "testsuite",
"title" : "PRIMARY CONTROLLER",
"host" : "controller-2-0-primary",
"url" : "https://controller-2-0-primary:4443",
"clusterUrl" : "https://controller-2-0-primary:4443",
"role" : "PRIMARY",
"isCoupled" : true,
"startedAt" : "2022-08-16T18:09:26.004+00:00",
"version" : "2.5.0-SNAPSHOT+fd0eb39",
"javaVersion" : "17.0.4+8-alpine-r0",
"os" : {
"name" : "Linux",
"architecture" : "amd64",
"distribution" : "3.10.0-957.1.3.el7.x86_64"
},
"securityLevel" : "MEDIUM",
"componentState" : {
"severity" : 0,
"_text" : "operational"
},
"connectionState" : {
"severity" : 0,
"_text" : "established"
},
"clusterNodeState" : {
"severity" : 0,
"_text" : "active"
}
},
"permanentStatus" : {
"id" : 2,
"surveyDate" : "2022-08-16T18:12:47.169+00:00",
"controllerId" : "testsuite",
"title" : "PRIMARY CONTROLLER",
"host" : "controller-2-0-primary",
"url" : "https://controller-2-0-primary:4443",
"clusterUrl" : "https://controller-2-0-primary:4443",
"role" : "PRIMARY",
"startedAt" : "2022-08-16T18:09:26.004+00:00",
"version" : "2.5.0-SNAPSHOT+fd0eb39",
"javaVersion" : "17.0.4+8-alpine-r0",
"os" : {
"name" : "Linux",
"architecture" : "amd64",
"distribution" : "3.10.0-957.1.3.el7.x86_64"
}
}
},
"jocStatus" : {
"active" : {
"id" : 2,
"memberId" : "joc-2-0-primary:97c88ccc3975703ebd0b7277d394ec8768f88b31775e8df038572d2547c240a0",
"title" : "PRIMARY JOC COCKPIT",
"current" : true,
"host" : "joc-2-0-primary",
"url" : "https://joc-2-0-primary:4443",
"startedAt" : "2022-08-16T18:10:27.000+00:00",
"version" : "2.5.0-SNAPSHOT",
"connectionState" : {
"severity" : 0,
"_text" : "established"
},
"componentState" : {
"severity" : 0,
"_text" : "operational"
},
"clusterNodeState" : {
"severity" : 0,
"_text" : "active"
},
"controllerConnectionStates" : [ {
"role" : "PRIMARY",
"state" : {
"severity" : 0,
"_text" : "established"
}
}, {
"role" : "BACKUP",
"state" : {
"severity" : 0,
"_text" : "established"
}
} ],
"os" : {
"name" : "Linux",
"architecture" : "amd64",
"distribution" : "3.10.0-957.1.3.el7.x86_64"
},
"securityLevel" : "MEDIUM",
"lastHeartbeat" : "2022-08-17T09:16:37.000+00:00"
},
"passive" : [ {
"id" : 1,
"memberId" : "joc-2-0-secondary:97c88ccc3975703ebd0b7277d394ec8768f88b31775e8df038572d2547c240a0",
"title" : "SECONDARY JOC COCKPIT",
"current" : false,
"host" : "joc-2-0-secondary",
"url" : "https://joc-2-0-secondary.sos:7543",
"startedAt" : "2022-08-16T18:10:27.000+00:00",
"version" : "2.5.0-SNAPSHOT",
"connectionState" : {
"severity" : 0,
"_text" : "established"
},
"componentState" : {
"severity" : 0,
"_text" : "operational"
},
"clusterNodeState" : {
"severity" : 1,
"_text" : "inactive"
},
"controllerConnectionStates" : [ {
"role" : "PRIMARY",
"state" : {
"severity" : 0,
"_text" : "established"
}
}, {
"role" : "BACKUP",
"state" : {
"severity" : 0,
"_text" : "established"
}
} ],
"os" : {
"name" : "Linux",
"architecture" : "amd64",
"distribution" : "3.10.0-957.1.3.el7.x86_64"
},
"securityLevel" : "MEDIUM",
"lastHeartbeat" : "2022-08-17T09:16:37.000+00:00"
} ]
},
"agentStatus" : [ {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_001",
"agentName" : "primaryAgent",
"url" : "https://agent-2-0-primary:4443",
"version" : "2.5.0-SNAPSHOT",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 1,
"isClusterWatcher" : true,
"disabled" : false
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_002",
"agentName" : "secondaryAgent",
"url" : "https://agent-2-0-secondary:4443",
"version" : "2.5.0-SNAPSHOT",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_004",
"agentName" : "wintestAgent",
"url" : "http://192.11.0.146:4245",
"version" : "2.4.0",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_005",
"agentName" : "apmaccsAgent",
"url" : "http://192.11.3.3:4449",
"state" : {
"severity" : 2,
"_text" : "UNKNOWN"
},
"healthState" : {
"severity" : 2,
"_text" : "NO_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : true
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_006",
"agentName" : "apmacwinAgent",
"url" : "http://192.11.2.2:4245",
"state" : {
"severity" : 2,
"_text" : "UNKNOWN"
},
"healthState" : {
"severity" : 2,
"_text" : "NO_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : true
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_101",
"agentName" : "agent17",
"url" : "http://centostest_primary.sos:7775",
"version" : "2.4.0-beta.20220714",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_009",
"agentName" : "oracleAgent",
"url" : "http://minos.sos:4445",
"version" : "2.4.0-beta.20220714",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"subagents" : [ {
"isDirector" : "PRIMARY_DIRECTOR",
"agentId" : "agent_cluster_001",
"subagentId" : "director_primary_001",
"url" : "https://diragent-2-0-primary:4443",
"version" : "2.5.0-SNAPSHOT",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"isDirector" : "NO_DIRECTOR",
"agentId" : "agent_cluster_001",
"subagentId" : "subagent_primary_001",
"url" : "https://subagent-2-0-primary:4443",
"version" : "2.5.0-SNAPSHOT",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"isDirector" : "NO_DIRECTOR",
"agentId" : "agent_cluster_001",
"subagentId" : "subagent_secondary_001",
"url" : "https://subagent-2-0-secondary:4443",
"version" : "2.5.0-SNAPSHOT",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"isDirector" : "NO_DIRECTOR",
"agentId" : "agent_cluster_001",
"subagentId" : "subagent_third_001",
"url" : "https://subagent-2-0-third:4443",
"version" : "2.5.0-SNAPSHOT",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
} ],
"controllerId" : "testsuite",
"agentId" : "agent_cluster_001",
"agentName" : "AgentCluster001",
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
}, {
"subagents" : [ ],
"controllerId" : "testsuite",
"agentId" : "agent_014",
"agentName" : "winutf8Agent",
"url" : "http://192.11.0.146:4445",
"version" : "2.4.0",
"state" : {
"severity" : 0,
"_text" : "COUPLED"
},
"healthState" : {
"severity" : 0,
"_text" : "ALL_SUBAGENTS_ARE_COUPLED_AND_ENABLED"
},
"orders" : [ ],
"runningTasks" : 0,
"isClusterWatcher" : false,
"disabled" : false
} ],
"orderSnapshot" : {
"pending" : 0,
"scheduled" : 1262,
"inProgress" : 0,
"running" : 1,
"prompting" : 0,
"suspended" : 0,
"waiting" : 770,
"blocked" : 0,
"failed" : 0,
"terminated" : 1
},
"orderSummary" : {
"failed" : 0
}
} |
Health Status Checks
The MonitoringJob performs the following health status checks:
...
Name | Required | Default Value | Purpose | Example |
---|---|---|---|---|
controller_id | no | Optionally specifies the identification of the Controller to be checked. By default the current Controller is used. | controller_prod | |
| yes | Specifies the directory to which the job will store health status report files (.json). The directory has to exist prior to running the job and has to be in reach of the Agent that runs the job.
|
| |
monitor_report_max_files | yes | The number of report files created will be limited to this value. Older report files will be removed when this value is exceeded | 25 | |
from | yes | Specifies the e-mail address that is used to send mail for notices and alerts. The argument is used by the job to create the | js7@example.com | |
max_failed_orders | no | The maximum number of failed orders that are considered acceptable for a health status check. If this number is exceeded then the result return variable will carry a non-zero value indicating a failed health check. | 3 |
Return Variables
Name | Data Type | Purpose | Example |
---|---|---|---|
monitor_report_date | String | The date and time for which the health status check has been performed. The date format is | controller_prod |
monitor_report_file | String | The path to the report file created for the health status check. | /var/sos-berlin.com/js7/agent/monitor/monitor.2022-08-15.17-35-36.5.json |
subject | String | The subject of an e-mail for use with a later MailJob. | JS7 Monitor: Notice from: js7@sos-berlin.com at: 2022-08-15.17-35-36.5 |
body | String | The body of an e-mail for use with a later MailJob, by default the value is the same as for the | JS7 Monitor: Notice from: js7@sos-berlin.com at: 2022-08-15.17-35-36.5 |
result | Number | The number of problems identified during the health status check. A value 0 indicates absence of problems, other values indicate existence of problems. | 0 |
...