Introduction
For Agents a Watchdog Script is provided that serves the purpose to
- start and restart the Agent,
- terminate child processes if an Agent is crashed,
- provide logging.
Watchdog Script
The Watchdog Script is provided for Unix and Windows. It is not used when operating the Agent from a Container or from a Windows Service.
The Watchdog Script is available from the following location:
- Unix
<agent-home>/bin/agent_watchdog.sh
- Windows
<agent-home>\bin\agent_watchdog.cmd
Starting Agent / Restarting Agent
Technically the Watchdog Script is used to start the Agent from its agent.sh|.cmd
Start Script.
In the following situations the Watchdog Script will restart the Agent:
- For the command line operation:
agent.sh|.cmd
restart
- For
reset
andreset forced
operations on Agents that are available from the JOC Cockpit's Manage Controllers/Agents page.
Terminating Processes after Crash
In a situation when the Agent gets crashed, users might find a number of processes and related child processes running for jobs. Such processes continue to run which is undesired behavior as the outcome of jobs and execution results would not be known.
The Agent keeps track of processes and child processes created for jobs. The Watchdog Script will pick up this information and will proceed as follows:
- if a period is specified with the
--sigkill-delay
option of the Agent Start Script, see JS7 - Agent Command Line Operation,- send job processes and child processes the SIGTERM signal,
- wait for termination of job processes and child processes,
- send remaining processes and child processes the SIGKILL signal.
FEATURE AVAILABILITY STARTING FROM RELEASE 2.7.2
- JS-2148Getting issue details... STATUS
Logging
The Watchdog Script will capture output to the stdout/stderr channels through the lifetime of the Agent.
Log output is stored to the <agent-data>/logs/watchdog.log
file.
- The log file reports the command line used to start the Agent.
- The log file holds information about use of a JS7 - License.
- The log file is an important source for analysis in case of problems:
- Any warnings and errors that will not make it for Log4j logging are reported to the log file.
- The same applies to warnings and errors that occur before the JVM is initialized and before Log4j logging can start, for example if an incomptible Java version is used when starting the Agent.
Watchdog Operation
Users can check from the processes used for the Agent that both watchdog process and Agent process are running in parallel:
-bash-4.2$ ps -ef | grep 9545 sos 3877 1 0 13:56 pts/1 00:00:00 /bin/sh /home/sos/training/agent/agent.home/bin/agent_watchdog.sh -9545 sos 3879 3877 1 13:56 pts/1 00:00:53 /opt/java/jdk-21/bin/java -DJS7.Agent=9545 -Xmx100m -Djava.security.egd=file:///dev/urandom -Xms100m -Dfile.encoding=UTF-8 -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.asyncLoggerWaitStrategy=Block -classpath /home/sos/training/agent/agent.home/lib:/home/sos/training/agent/agent.home/var_9545/config/patches/*:/home/sos/training/agent/agent.home/var_9545/config/lib/*:/home/sos/training/agent/agent.home/lib/patches/*:/home/sos/training/agent/agent.home/lib/user_lib/*:/home/sos/training/agent/agent.home/lib/sos/*:/home/sos/training/agent/agent.home/lib/3rd-party/*:/home/sos/training/agent/agent.home/lib/jdbc/* js7.agent.main.AgentMain --http-port=9545 --config-directory=/home/sos/training/agent/agent.home/var_9545/config --data-directory=/home/sos/training/agent/agent.home/var_9545 --job-working-directory=/home/sos/training/agent/agent.home/var_9545/work
If the Watchdog process is not available then this will affect operation of the Agent:
- The Agent cannot be restarted when operated for Unix or Windows.
- The
reset
andreset forced
operations on Agents available from the JOC Cockpit GUI cannot be performed. - In case of Agent crash no job processes and related child processes will be terminated.
Besides the above effects the Agent will continue normal operation if the Watchdog process is not available.