Page History
...
- The job is configured with a timeout setting: if job execution exceeds the timeout then the job will be terminated by the Agent.
- Jobs can be terminated using GUI operations and by use of the JS7 - REST Web Service API:
- The Cancel/Killforce operation terminates a running job and fails the order.
- The Suspend/Killforce operation terminates a running job and suspends the order.
- Failed and suspended orders can be resumed.
...
- When a job is to be terminated then the Agent first sends a
SIGTERM
signal.- This signal can be ignored or it can be handled by a job script. For shell jobs a
trap
can be defined to, for example, perform cleanup tasks such as disconnecting from a database or removing temporary files. - Note that this applies to job scripts that directly include shell code. If instead the job script includes calls to external shell scripts or programs then the Agent's
SIGTERM
signal is not forwarded to child processes running for external scripts or programs. To prevent this situation external shell scripts or programs can be called like this:exec /tmp/some_script.sh
- The
exec
command causes any external scripts or programs to be executed with the process of the current job script (instead of creating a new child process) and guarantees that theSIGTERM
signal is received by the process.
- This signal can be ignored or it can be handled by a job script. For shell jobs a
- The job configuration includes the Grace Timeout setting:
- The Grace Timeout duration is applied after a
SIGTERM
signal (corresponding to the commandkill -15)
has been sent by the Agent. This allows the job to terminate on its own, for example after some cleanup has been performed.
- The Grace Timeout duration is applied after a
- Should the job still be running after the specified Grace Timeout duration then the Agent will send a
SIGKILL
signal (corresponding to the commandkill -9
) that kills forces termination of the OS process. - Note that it is recommended for job scripts that create child processes not to terminate on receipt of a
SIGTERM
signal before child processes are terminated.- Job scripts can use the
wait
command to wait for completion of child processes as this command prevents termination of the job script on receipt ofSIGTERM
. - Job scripts including any child processes will be reliably killed by terminated forcibly by
SIGKILL
after the specified Grace Timeout.
- Job scripts can use the
Job scripts frequently spawn child processes that have to be terminated in line with their parent process.
By default the OS kills forces termination of child processes if the parent process is killedforcibly terminated. However, this mechanism is not applicable for all situations, depending on the way child processes have been spawned.
...
Terminating Child Processes starting from Release 2.1.1
- In order to more reliably kill child force termination of child processes the Agent uses the
kill_task.sh
script from itsvar_<port>/work
directory.- This script identifies the process tree created by the job script and kills forces termination of any available child processes.
- Download: kill_task.sh
- Though the Agent is platform independent it is evident that retrieval of a process tree does not necessarily use the same command (
ps
) and options for all Unixes.- The Agent therefore allows specification of an individual kill script from a command line option if the built-in
kill_task.sh
script is not applicable to your Unix platform, see JS7 - Agent Operation.
- The Agent therefore allows specification of an individual kill script from a command line option if the built-in
The OS commands used by the Agent to send signals include:
Termination signals
Signal Command SIGTERM
/bin/kill <pid>
SIGKILL
/bin/kill -KILL <pid>
- If required for your Agent platform, the commands to send signals can be modified - see the JS7 - Agent Configuration Items article.
...
In a situation when a Shell Job script starts a background process and does not wait for termination of the child process but instead completes (with or without error), then the Agent cannot identify the running child process as its parent process has gone. It is therefore recommended that a trap is added to the shell script. This will be triggered on termination of the script - independently of whether the script terminates normally or with an error. This prevents the script from terminating immediately while child processes are running. Instead, in the event of forced termination, the script will continue due to its trap waiting for child processes and the Agent will execute the kill_task.sh
script. This script identifies the Shell Job script process and kills the forces termination of the running child processes.
Download (upload .json): jduExitTrap.workflow.json
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/usr/bin/env bash JS7Trap() { rc=$? # wait for completion of child processes or let kill_task.sh cleanforce termination upof child processes echo "($(date +%T.%3N)) $(basename $0): JS7Trap for signal $1: waiting for completion of child processes ..." wait echo "($(date +%T.%3N)) $(basename $0): JS7Trap for signal $1: leaving trap, exit code $rc" exit $rc } # define trap for script completion trap 'JS7Trap EXIT' EXIT trap 'JS7Trap TERM' TERM trap 'JS7Trap INT' INT # create three child processes sleep 100 & sleep 110 & sleep 120 & # this is what the script normally should do: # echo "waiting for completion of child processes" # wait echo "script completed" |
...
- Line 3 - 11: implements the
JS7Trap()
function including thewait
command. This either waits for termination of child processes or continues immediately.- The exit code returned from the trap in the event of script termination is reported by the task log and order log.
- However, job execution will be considered to have failed regardless of the exit code value as the Cancel/Killforce or Suspend/Killforce operation has been performed.
- Line 14-16: define traps calling the
JS7Trap()
function in the event of the following signals being received:EXIT
is a summary for a number of signals that terminate a script, however, this is available for the bash shell only.TERM
is the termination signal sent by the Agent if the Cancel/Kill or force or Suspend/Killforce operation is invoked.INT
is added in case OS processes external to the JS7 Agent send this signal, which usually corresponds to hitting Ctrl+C in a terminal session.
- Line 15-17: starts background processes.
- Line 21 a script should normally
wait
for child processes. However, if this cannot be guaranteed, for example ifset -e
is used to abort a script in case of error, then the use of a trap is an appropriate measure. - The following sequence of actions is performed:
- The job script listed above does not wait for child processes and therefore terminates triggering the EXIT pseudo-signal. The trap function is executed and waits for child processes to be completed. During this period the task process for the job remains alive.
- If subsequently the Cancel/Killforce or Suspend/Killforce operation is invoked, then the Agent will send a
SIGTERM
signal which:- interrupts the
wait
command in the currently executedJS7Trap()
function, - triggers execution of the
JS7Trap()
function once more and performs thewait
operation for child processes.
- interrupts the
- Having applied the Grace Timeout the Agent executes the
kill_task.sh
script which sends aSTOP
signal to the task process, kills forces termination of any child processes and finally sends aSIGKILL
signal to abort forcibly terminate the task process. - The crucial point is that the job script does not terminate with child processes running but remains active due to triggering of a trap which allows the Agent to kill force termination of any child processes from the process tree. If the task process for the job script terminates with child processes running then the Agent cannot identify the process tree and cannot kill force termination of child processes.
If the job script in the above example is executed from a script file then the exec
command should be used to call the script file like this:
...
Windows environments do not know about termination signals. When terminating a process then it will be killed terminated immediately.
Terminating Child Processes starting from Release 2.7.2
- When terminating a job process, the Agent performs the following steps:
- collect child process PIDs of job process recursively,
- kill force termination of a job process and any child processes recursively.
- The Agent makes use of Java for process management.
...
- The Agent uses the
kill_task.cmd
script which is available from itsvar_<port>/work
directory.- The script uses the
taskkill
command to kill force termination of the job's process and its children. - Download: kill_task.cmd
- The script uses the
- An individual kill script can be specified with a command line option on Agent startup, see JS7 - Agent Operation.
...
- Crash of the Agent is different to termination:
- The Agent process is killedforcibly terminated, for example using the command
kill -9 <agent-pid>
on Unix. - The JS7 - Agent Watchdog will terminate any running job processes. Orders for workflows related to affected jobs will be put to the blocked state.
- The Agent process is killedforcibly terminated, for example using the command
- Crash of the machine or of the container the Agent is operated for will crash the Agent and any running jobs. Related orders will be put to the blocked state.
- Users have limited control of blocked orders as the Controller does not know the execution status.
- Standalone Agents will restart crashed jobs on restart of the Agent unless jobs are marked being not restartable. No operations on blocked orders can be performed until the Standalone Agent is restarted.
- Cluster Agents allow to confirm loss of a crashed Subagent. In this situation crashed jobs will be restarted from some other Subagent unless they are marked being not restartable. Users control if jobs will be restarted on restart of the crashed Subagent or if they should be restarted from some other Agent.
...