Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

  • Jobs are executed with JS7 Agents that which handle termination of jobs.
    • Shell Jobs and JVM Jobs are under Control control of the Agent that Agent which terminates running jobs.
    • Jobs implementing use of an SSH Client or use of the JS7 - JITL SSHJob cannot guarantee that a job's child processes are terminated as they are controlled by the remote SSHD server. The JS7 - JITL SSHJob provides the means to reliably terminate child processes.
  • Termination of jobs can be caused triggered by users from the JOC Cockpit and can be performed automatically if jobs exceed a job exceeds a given timeout.
    • As a prerequisite for termination by the JOC Cockpit, the Controller has to be connected to the JOC Cockpit and the Agent has to be accessible to the Controller.
    See 

Display feature availability
StartingFromRelease2.1.1

Jira
serverSOS JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1965

Display feature availability
StartingFromRelease2.7.2

Jira
serverSOS JIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-2148

Termination of Jobs

Jobs can be terminated in one of the following ways:

  • The job is configured with a timeout setting: if job execution exceeds the timeout then the job will be killed terminated by the Agent.
  • Jobs can be killed by use of the GUI operation and and terminated using GUI operations and by use of the the JS7 - REST Web Service API:
    • The Cancel/Killforce operation kills terminates a running job and fails the order.
    • The Suspend/Killforce operation kills terminates a running job and suspends the order.
    • Failed and suspended orders can be resumed.
  • For restart capabilities of jobs see JS7 - FAQ - Can JobScheduler restart failed Jobs.

Terminating Jobs on Unix

In Unix environments, jobs receive the following signals from the Agent:

  • When a job should is to be killed terminated then the Agent first sends a SIGTERM signal.
    • This The signal can be ignored or it can be handled by a job script. For shell scripts jobs a trap can be defined to e.g. , for example, to perform cleanup tasks such as disconnecting from a database or removing temporary files.
    • Note that this applies to job scripts that directly include shell code. If instead the job script includes calls to external shell scripts or programs then the Agent's SIGTERM signal is not forwarded to child processes running for external scripts or programs. To prevent this situation external shell scripts or programs can be called like this:
      • exec /tmp/some_script.sh
      • The exec command causes any external script or program to be executed with the process of the current job script (instead of creating a new child process) and guarantees that the SIGTERM signal is received by the process.
  • The job configuration includes the Grace timeoutTimeout setting:
    • The Grace Timeout duration is applied after a SIGTERM signal (corresponding to the command kill -15) has been sent by the Agent. This allows the job to terminate on its own, for example after some cleanup is has been performed.
  • Should the job still run be running after the specified Grace Timeout duration then the Agent sends will send a SIGKILL signal (corresponding to the command kill -9) that aborts forces termination of the OS process.

The OS commands used by the Agent to send signals include:

  • Note that it is recommended for job scripts that create child processes not to terminate on receipt of a SIGTERM signal before child processes are terminated. 
    • Job scripts can use the wait command to wait for completion of child processes as this command prevents termination of the job script on receipt of SIGTERM.
    • Job scripts including any child processes will be forcibly terminated by SIGKILL after the specified Grace Timeout.

    Termination signals

    SignalCommandSIGTERM

    /bin/kill <pid>

    SIGKILL/bin/kill -KILL <pid>If required for your Agent platform then the commands can be adjusted, see JS7 - Agent Configuration Items

Job scripts frequently spawn child processes that have to be killed accordingly to terminated in line with their parent process.

By default the OS

...

forces termination of child processes if the parent process is

...

forcibly terminated. However, this mechanism is not applicable for all situations, depending on the way

...

child processes have been spawned.

Terminating Child Processes starting from Release 2.7.2

When terminating a job process, the Agent performs the following steps:

    • collect child process PIDs of job process,
    • send SIGTERM to job process,
    • wait for one of the following events, whichever arrives first:
      • wait for Grace Timeout configured with the JS7 - Job Instruction
      • wait for stdout/stderr to be released by the job process and child processes
    • send SIGKILL signal to the job process,
    • if the --sigkill-delay option of the Agent Start Script is used - see JS7 - Agent Command Line Operation - then
      • send SIGTERM signal to remaining child processes for which PIDs have previously been collected,
      • wait for the indicated delay or for child processes releasing stdout/stderr whichever comes earlier,
    • send SIGKILL signal to remaining child processes.
  • The Agent makes use of Java for process management.
  • Users are free to use traps as explained with the below chapter. However, there is no need to add a trap to job scripts as the Agent by default will terminate child processes.

Terminating Child Processes starting from Release 2.1.1

  • In order to force termination of child processes the Agent uses In order to more reliably kill child processes the Agent makes use of the kill_task.sh script from its var_<port>/work directory.
    • This script identifies the process tree created by the job script and kills forces termination of any available child processes.
    • Download: kill_task.sh
  • Though the Agent is platform independent it is evident that retrieval of a process tree does not necessarily use the same command (ps) and options for any all Unixes.
    • The Agent therefore allows to specify specification of an individual kill script from a command line option should if the built-in kill_task.sh script is not be applicable to your Unix platform, see JS7 - Agent Operation.
  • The OS commands used by the Agent to send signals include:

    • Termination signals

      SignalCommand
      SIGTERM

      /bin/kill <pid>

      SIGKILL/bin/kill -KILL <pid>
    • If required for your Agent platform, the commands to send signals can be modified - see the JS7 - Agent Configuration Items article.

Use of Exit Traps

The Short Version

Users can add the following two traps to their Shell Jobs:

Code Block
languagebash
titleExample for concise use of traps for script termination
linenumberstrue
#!/usr/bin/env bash

trap "wait && exit 1" TERM
trap "rc=$? && wait && exit $?" EXIT

For explanations see the long version.

The Long Version

In a situation when a Shell Job script starts a background process and does not wait for termination of the child process but instead completes (with our or without error), then the Agent cannot identify the running child process ( as its parent process is has gone). It is therefore recommended to add that a trap is added to the shell script that is . This will be triggered on termination of the script - independently from of whether the fact that the script terminates normally or with an error. This prevents the script from terminating immediately with while child processes are running. Instead, in case the event of forced termination, the script continues will continue due to its trap waiting for child processes and the Agent executes will execute the kill_task.sh script. This script that identifies the process of the Shell Job script and kills any process and forces termination of the running child processes.

Download (upload .json)jduExitTrap.workflow.json

Code Block
languagebash
titleExample for Exit Trap on Script for talkative use of exit traps for script Termination
linenumberstrue
#!/usr/bin/env bash

# define trap for script completion
trap 'JS7TrapOnExit' EXIT

JS7TrapOnExitJS7Trap()
{
    rc=$?
    # wait for completion of child processes or let kill_task.sh force termination of child processes
    echo "($(date +%T.%3N)) $(basename $0): JS7TrapOnExit JS7Trap for signal $1: waiting for completion of child processes ..."
    wait
    echo "($(date +%T.%3N)) $(basename $0): JS7Trap for signal $1: leaving trap, exit code $rc"
    exit $rc
}

# define trap for script completion
trap 'JS7Trap EXIT' EXIT
trap 'JS7Trap TERM' TERM
trap 'JS7Trap INT' INT

# create three child processes
sleep 100 &
sleep 110 &
sleep 120 &

# this is what the script normally should do:
#   echo "waiting for completion of child processes"
#   wait

echo "script completed"

Explanation:

  • Line 4: defines the trap calling the JS7TrapOnExit() function in case of the EXIT event. EXIT is a summary for a number of signals that terminate a script, however, this is available for the bash shell only. For use with other shells users instead have to state the list of signals such as TERM, INT etc.Line 6 - 12: implements the JS7TrapOnExit3 - 11: implements the JS7Trap()function including the wait command to wait . This either waits for termination of child processes or otherwise to continues immediately continue.
    • The exit code returned from the trap in the event of script termination is reported by the task log and order log.
    • However, job execution will be considered failed independently from its to have failed regardless of the exit code value as the Cancel/Killforce or Suspend/Killforce operation was has been performed.
  • Line 14-16: define traps calling the JS7Trap() function in the event of the following signals being received:
    • EXIT is a summary for a number of signals that terminate a script, however, this is available for the bash shell only.
    • TERM is the termination signal sent by the Agent if the Cancel/force or Suspend/force operation is invoked.
    • INT is added in case OS processes external to the JS7 Agent send this signal, which usually corresponds to hitting Ctrl+C in a terminal session.
  • Line 15-17: starts background processes.
  • Line 21 a script should normally should wait for child processes, however. However, if this cannot be guaranteed, for example if set -e is used to abort a script in case of error, then the use of a trap is an appropriate measure.
  • The following sequence of actions is performed:
    • The job script listed above does not wait for child processes and therefore terminates triggering the EXIT pseudo-signal. The trap function is executed and waits for child processes to be completed. During this period the task process for the job remains alive.
    • If subsequently the Cancel/force or Suspend/force operation is invoked, then the Agent will send a SIGTERM signal which:
      • interrupts the wait command in the currently executed JS7Trap()function,
      • triggers execution of the JS7Trap()function once more and performs the wait operation for child processes.
    • Having applied the Grace Timeout the Agent executes the kill_task.sh script which sends a STOP signal to the task process, forces termination of any child processes and finally sends a SIGKILL signal to forcibly terminate the task process.
    • The crucial point is that the job script does not terminate with child processes running but remains active due to triggering of a trap which allows the Agent to force termination of any child processes from the process tree. If the task process for the job script terminates with child processes running then the Agent cannot identify the process tree and cannot force termination of child processes.

If the job script in the above example is executed from a script file then the exec command should be used to call the script file like this:

Code Block
languagebash
titleExample for calling shell scripts with exit traps
linenumberstrue
#!/usr/bin/env bash

exec /tmp/some_script.sh

Automation of Exit Traps

JS7 offers provides an option to apply for applying traps such as from those described in the example above example . These can be applied to a number of Shell Job scripts via JS7 - Script Includes.

  • The trap and the trap function are added to a Script Include like this:




  • The Script Include is embedded into any Shell Job scripts from a single line similar to a shebang:



Terminating Jobs on Windows

For Windows environments the following applies when terminating jobs:

Windows environments do not know about termination signals. When terminating a process then it will be terminated immediately.

Terminating Child Processes starting from Release 2.7.2

  • When terminating a job process, the Agent performs the following steps:
    • collect child process PIDs of job process recursively,
    • force termination of a job process and child processes recursively.
  • The Agent makes use of Java for process management.

Terminating Child Processes starting from Release 2.1.1

  • The Agent uses the kill_task.cmd script that which is available from its var_<port>/work directory.
    • The script makes use of uses the taskkill command to kill force termination of the job's process and its childrenchild processes.
    • Download: kill_task.cmd
  • An individual kill script can be specified with a command line option on Agent startup, see JS7 - Agent Command Line Operation.

Termination of Agent

The follow applies to releases starting from 2.7.2.

Termination of the Agent will consider terminating and restarting jobs as follows:

  • For details see JS7 - Agent Command Line Operation, Stopping the Agent.
  • Depending on command line options in use job processes will forcibly be terminated. Orders for workflows related to affected jobs will be set to the failed state.
  • Users are in control of failed orders that can be cancelled or resumed. Use of an Agent Cluster allows to resume orders without waiting for the terminated Subagent to be restarted.

Crash of the Agent will consider terminating and restarting jobs as follows:

  • Crash of the Agent is different to termination:
    • The Agent process is forcibly terminated, for example using the OS command kill -9 <agent-pid> on Unix.
    • The JS7 - Agent Watchdog will terminate any running job processes. Orders for workflows related to affected jobs will be set to the blocked state.
  • Crash of the machine or of the container the Agent is operated for will crash the Agent and running jobs. Related orders will be set to the blocked state.
  • Users have limited control of blocked orders as the Controller does not know the execution status.
    • Standalone Agents will restart crashed jobs on restart of the Agent unless jobs are marked being not restartable. No operations on blocked orders can be performed until the Standalone Agent is restarted.
    • Cluster Agents allow to confirm loss of a crashed Subagent. In this situation crashed jobs will be restarted from some other Subagent unless they are marked being not restartable. Users control if jobs will be restarted on restart of the crashed Subagent or if they should be restarted from some other Subagent.
    • For details about restart capabilities of jobs see JS7 - FAQ - Can JobScheduler restart failed Jobs.

Resources