Introduction

For Agents a Watchdog Script is provided that serves the purpose to

  • start and restart the Agent,
  • terminate child processes if an Agent is crashed,
  • provide logging.

Watchdog Script

The Watchdog Script is provided for Unix and Windows. It is not used when operating the Agent from a Container or from a Windows Service.

The Watchdog Script is available from the following location:

  • Unix
    • <agent-home>/bin/agent_watchdog.sh
  • Windows
    • <agent-home>\bin\agent_watchdog.cmd

Starting Agent / Restarting Agent

Technically the Watchdog Script is used to start the Agent from its agent.sh|.cmd Start Script.

In the following situations the Watchdog Script will restart the Agent:

  • For the command line operation: agent.sh|.cmd restart
  • For reset and reset forced operations on Agents that are available from the JOC Cockpit's Manage Controllers/Agents page.

Terminating Processes after Crash

In a situation when the Agent gets crashed, users might find a number of processes and related child processes running for jobs. Such processes continue to run which is undesired behavior as the outcome of jobs and execution results would not be known.

The Agent keeps track of processes and child processes created for jobs. The Watchdog Script will pick up this information and will proceed as follows:

  • if a period is specified with the --sigkill-delay option of the Agent Start Script, see JS7 - Agent Command Line Operation
    • send job processes and child processes the SIGTERM signal,
    • wait for termination of job processes and child processes,
  • send remaining processes and child processes the SIGKILL signal.

FEATURE AVAILABILITY STARTING FROM RELEASE 2.7.2

JS-2148 - Getting issue details... STATUS

Logging

The Watchdog Script will capture output to the stdout/stderr channels through the lifetime of the Agent.

Log output is stored to the <agent-data>/logs/watchdog.log file.

  • The log file reports the command line used to start the Agent.
  • The log file holds information about use of a JS7 - License.
  • The log file is an important source for analysis in case of problems:
    • Any warnings and errors that will not make it for Log4j logging are reported to the log file.
    • The same applies to warnings and errors that occur before the JVM is initialized and before Log4j logging can start, for example if an incomptible Java version is used when starting the Agent.

Watchdog Operation

Users can check from the processes used for the Agent that both watchdog process and Agent process are running in parallel:

Example for display of Watchdog process and Agent process
-bash-4.2$ ps -ef | grep 9545
sos       3877     1  0 13:56 pts/1    00:00:00 /bin/sh /home/sos/training/agent/agent.home/bin/agent_watchdog.sh -9545
sos       3879  3877  1 13:56 pts/1    00:00:53 /opt/java/jdk-21/bin/java -DJS7.Agent=9545 -Xmx100m -Djava.security.egd=file:///dev/urandom -Xms100m -Dfile.encoding=UTF-8 -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.asyncLoggerWaitStrategy=Block -classpath /home/sos/training/agent/agent.home/lib:/home/sos/training/agent/agent.home/var_9545/config/patches/*:/home/sos/training/agent/agent.home/var_9545/config/lib/*:/home/sos/training/agent/agent.home/lib/patches/*:/home/sos/training/agent/agent.home/lib/user_lib/*:/home/sos/training/agent/agent.home/lib/sos/*:/home/sos/training/agent/agent.home/lib/3rd-party/*:/home/sos/training/agent/agent.home/lib/jdbc/* js7.agent.main.AgentMain --http-port=9545 --config-directory=/home/sos/training/agent/agent.home/var_9545/config --data-directory=/home/sos/training/agent/agent.home/var_9545 --job-working-directory=/home/sos/training/agent/agent.home/var_9545/work

If the Watchdog process is not available then this will affect operation of the Agent:

  • The Agent cannot be restarted when operated for Unix or Windows.
  • The reset and reset forced operations on Agents available from the JOC Cockpit GUI cannot be performed.
  • In case of Agent crash no job processes and related child processes will be terminated.

Besides the above effects the Agent will continue normal operation if the Watchdog process is not available.



  • No labels