Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Terminating Child Processes starting from Release 2.7.2

When terminating a job process, the Agent performs the following steps:

    • collect
  • chld
    • child process PIDs of job process,
    • send SIGTERM to job process,
    • wait for one of the following events, whichever arrives first:
      • wait for Grace Timeout configured with the JS7 - Job Instruction
      • wait for stdout/stderr to be released by the job process and child processes
    • send SIGKILL signal to job process if Grace Timeout is exceeded,
    • if the --sigkill-delay option of the Agent Start Script is used - see JS7 - Agent Command Line Operation - then
      • send SIGTERM signal to remaining child processes for which PIDs have previously been collected,
      • wait for the
  • delay specified with the --sigkill-delay option of the Agent Start Script, see JS7 - Agent Command Line Operation
      • indicated delay
    • send SIGKILL signal to remaining child processes
  • recursively
    • .
  • The Agent makes use of Java for process management.
  • Users are free to use traps as explained with the below chapter. However, there is no need to add a trap to job scripts as the Agent by default will terminate child processes.

...

  • In order to more reliably kill child processes the Agent uses the kill_task.sh script from its var_<port>/work directory.
    • This script identifies the process tree created by the job script and kills any available child processes.
    • Download: kill_task.sh
  • Though the Agent is platform independent it is evident that retrieval of a process tree does not necessarily use the same command (ps) and options for all Unixes.
    • The Agent therefore allows specification of an individual kill script from a command line option if the built-in kill_task.sh script is not applicable to your Unix platform, see JS7 - Agent Operation.
  • The OS commands used by the Agent to send signals include:

    • Termination signals

      SignalCommand
      SIGTERM

      /bin/kill <pid>

      SIGKILL/bin/kill -KILL <pid>
    • If required for your Agent platform, the commands to send signals can be modified - see the JS7 - Agent Configuration Items article.

...

  • The Agent uses the kill_task.cmd script which is available from its var_<port>/work directory.
    • The script uses the taskkill command to kill the job's process and its children.
    • Download: kill_task.cmd
  • An individual kill script can be specified with a command line option on Agent startup, see JS7 - Agent Operation.

Termination of Agent

Termination of the Agent will consider terminating jobs:

  • For details see JS7 - Agent Command Line Operation, Stopping the Agent.
  • Depending on command line options used job processes will forcibly be terminated. Orders for workflows related to affected jobs will be put to the failed state.
  • Users are in control of failed orders that can be cancelled or resumed from some other Agent in an Agent Cluster.

Crash of the Agent will consider termination of jobs:

  • Crash of the Agent is different to termination:
    • The Agent process is killed, for example using the command kill -9 <agent-pid> on Unix.
    • The JS7 - Agent Watchdog will terminate any running job processes. Orders for workflows related to affected jobs will be put to the blocked state.
  • Crash of the machine or of the container the Agent is operated for will crash the Agent and any running jobs. Related orders will be put to the blocked state.
  • Users have limited control of blocked orders as the Controller does not know the execution status.
    • Standalone Agents will restart crashed jobs on restart of the Agent unless jobs are marked being not restartable. No operations on blocked orders can be performed until the Standalone Agent is restarted.
    • Cluster Agents allow to confirm loss of a crashed Subagent. In this situation crashed jobs will be restarted from some other Subagent unless they are marked being not restartable. Users control if jobs will be restarted on restart of the crashed Subagent or if they should be restarted from some other Agent.