Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Jobs are executed with JS7 Agents which handle termination of jobs.
    • Shell Jobs and JVM Jobs are under control of the Agent which terminates running jobs.
    • Jobs implementing use of an SSH Client cannot guarantee that a job's child processes are terminated as they are controlled by the remote SSHD server. The JS7 - JITL SSHJob provides the means to reliably kill terminate child processes.
  • Termination of jobs can be caused triggered by users from the JOC Cockpit and can be performed automatically if jobs exceed a given timeout.
    • As a prerequisite for termination by the JOC Cockpit, the Controller has to be connected to the JOC Cockpit and the Agent has to be accessible to the Controller.

...

  • The job is configured with a timeout setting: if job execution exceeds the timeout then the job will be terminated by the Agent.
  • Jobs can be terminated using the GUI operation operations and by use of the JS7 - REST Web Service API:
    • The Cancel/Kill operation terminates a running job and fails the order.
    • The Suspend/Kill operation terminates a running job and suspends the order.
    • Failed and suspended orders can be resumed.

Terminating Jobs on Unix

In Unix environments, jobs receive the following signals from the Agent:

  • When a job is to be terminated then the Agent first sends a SIGTERM signal.
    • This signal can be ignored or it can be handled by a job script. For shell jobs a trap can be defined to, for example, perform cleanup tasks such as disconnecting from a database or removing temporary files.
    • Note that this applies to job scripts that directly include shell code. If instead the job script includes calls to external shell scripts or programs then the Agent's SIGTERM signal is not forwarded to child processes running for external scripts or programs. To prevent this situation external shell scripts or programs can be called like this:
      • exec /tmp/some_script.sh
      • The exec command causes any external scripts or programs to be executed with the process of the current job script (instead of creating a new child process) and guarantees that the SIGTERM signal is received by the process.
  • The job configuration includes the Grace Timeout setting:
    • The Grace Timeout duration is applied after a SIGTERM signal (corresponding to the command kill -15) has been sent by the Agent. This allows the job to terminate on its own, for example after some cleanup has been performed.
  • Should the job still be running after the specified Grace Timeout duration then the Agent will send a SIGKILL signal (corresponding to the command kill -9) that kills the OS process.
  • Note that it is essential recommended for job scripts that create child processes not to terminate on receipt of a SIGTERM signal before child processes are terminated. 
    • Job scripts can use the wait command to wait for completion of child processes as this command prevents termination of the job script on receipt of SIGTERM.
    • Job scripts including any child processes will then be reliably killed by SIGKILL after the specified Grace Timeout.

The OS commands used by the Agent to send signals include:

  • Termination signals

    SignalCommandSIGTERM

    /bin/kill <pid>

    SIGKILL/bin/kill -KILL <pid>

Job scripts frequently spawn child processes that have to be killed terminated in line with their parent process.

By default the OS kills child processes if the parent process is killed. However, this mechanism is not applicable for all situations, depending on the way child processes have been spawned.

Terminating Child Processes starting from Release 2.7.2

  • When terminating a job process, the Agent performs the following steps:
    • collect chld process PIDs of job process,
    • send SIGTERM to job process,
    • wait for one of the following events, whichever arrives first:
      • wait for Grace Timeout configured with the job, JS7 - Job Instruction
      • wait for stdout/stderr to be released by the job process .and child processes
    • send SIGKILL signal to job process if Grace Timeout is exceeded,
    • send SIGTERM signal to child processes for which PIDs have previously been collected; send SIGTERM recursively to child processes of a child process,
    • wait for 50% of the duration of the Grace Timeout or for 1s whichever is the higher value,the delay specified with the --sigkill-delay option of the Agent Start Script, see JS7 - Agent Command Line Operation
    • send SIGKILL signal to remaining child processes recursively.
  • The Agent makes use of Java for process management.
  • Users are free to use traps as explained with the below chapter. However, there is no need to add a trap to job scripts as the Agent by default will terminate child processes.

Terminating Child Processes starting from Release 2.1.1

  • In order to more reliably kill child processes the Agent uses the kill_task.sh script from its var_<port>/work directory.
    • This script identifies the process tree created by the job script and kills any available child processes.
    • Download: kill_task.sh
  • Though the Agent is platform independent it is evident that retrieval of a process tree does not necessarily use the same command (ps) and options for all Unixes.
    • The Agent therefore allows specification of an individual kill script from a command line option if the built-in kill_task.sh script is not applicable to your Unix platform, see JS7 - Agent Operation.
  • The OS commands used by the Agent to send signals include:

    • Termination signals

      SignalCommand
      SIGTERM

      /bin/kill <pid>

      SIGKILL/bin/kill -KILL <pid>
    • If required for your Agent platform, the commands to send signals can be modified - see the JS7 - Agent Configuration Items article.

Use of Exit Traps

The Short Version

You Users can add the following two traps to your their Shell Jobs:

Code Block
languagebash
titleExample for concise use of traps for script termination
linenumberstrue
#!/usr/bin/env bash

trap "wait && exit 1431" TERM # 128+15
trap "rc=$? && wait && exit $?" EXIT

For explanations see the long version.

The Long Version

In a situation when a Shell Job script starts a background process and does not wait for termination of the child process but instead completes (with or without error), then the Agent cannot identify the running child process as its parent process has gone. It is therefore recommended that a trap is added to the shell script. This will be triggered on termination of the script - independently of whether the script terminates normally or with an error. This prevents the script from terminating immediately while child processes are running. Instead, in the event of forced termination, the script will continue due to its trap waiting for child processes and the Agent will execute the kill_task.sh script. This script identifies the Shell Job script process and kills the running child processes.

...

Code Block
languagebash
titleExample for calling shell scripts with exit traps
linenumberstrue
#!/usr/bin/env bash

exec /tmp/some_script.sh

Automation of Exit Traps

JS7 provides an option for applying traps such as those described in the example above. These can be applied to a number of Shell Job scripts via JS7 - Script Includes.

  • The trap and the trap function are added to a Script Include like this:




  • The Script Include is embedded into any Shell Job scripts from a single line similar to a shebang:



Terminating Jobs on Windows

Windows environments do not know about termination signals. When terminating a process then it will be killed immediately.

Terminating Child Processes starting from Release 2.7.2

  • When terminating a job process, the Agent performs the following steps:
    • collect chld child process PIDs of job process recursively,
    • kill job process and any child processes recursively.
  • The Agent makes use of Java for process management.

Terminating Child Processes starting from Release 2.1.1

  • The Agent uses the kill_task.cmd script which is available from its var_<port>/work directory.
    • The script uses the taskkill command to kill the job's process and its children.
    • Download: kill_task.cmd
  • An individual kill script can be specified with a command line option on Agent startup, see JS7 - Agent Operation.

...