Introduction

  • Traps are used in shell jobs for the situation that a job should not be aborted immediately, but should be terminated after having performed a cleanup operation such as:
    • removing temporary files created by the job,
    • disconnecting from a database.
  • Traps are available for the Unix Shell, not for JVM Jobs and not for Windows Shell Jobs.
  • FEATURE AVAILABILITY STARTING FROM RELEASE 2.1.1

Example

Download (.json upload)jduCleanupTrap.json

The implementation of a cleanup trap in a JS7 job script can look like this:

Example for a cleanup trap
#!/bin/bash

# define trap to forward receipt of the SIGTERM signal to a child process
trap 'kill -TERM $CHILD_PID' SIGTERM

# create shell script for background execution
TRAP_SCRIPT=$HOME/test-trap.sh
cat << 'EOF' > $TRAP_SCRIPT
#!/bin/bash

exitOnSigterm()
{
    exec &> /dev/tty
    local signal="$1"
    echo "($(date +%T.%3N)) $(basename $0): trap received signal \"$signal\", cleaning up..."
    if [[ "$CHILD_PID" != "" && -d /proc/$CHILD_PID ]]; then
        procInfo="$(ps -ef | /bin/grep -e PPID -e $CHILD_PID | /bin/grep -v /bin/grep)"
        echo -e "($(date +%T.%3N)) $(basename $0): trap found child process $CHILD_PID:\n$procInfo\n"
        cleanupTemporaryFiles "trap" "$CHILD_PID"
        # traps are not required to terminate child processes: the Agent terminates child processes
        # /bin/kill -TERM "$CHILD_PID"
    fi
}

cleanupTemporaryFiles()
{
    exec &> /dev/tty
    tempFile="/tmp/temporary_file.$2"
    if [ -f "$tempFile" ]; then
        rm -f $tempFile
        echo "($(date +%T.%3N)) $(basename $0): cleanup performed by $1 for temporary file: $tempFile"
    fi
}

# add trap to call function on receipt of SIGTERM signal
trap 'exitOnSigterm "SIGTERM"' SIGTERM

# run sleep command in background and create a temporary file for later removal
sleep 120 &
CHILD_PID="$!"
touch /tmp/temporary_file.$CHILD_PID

# wait for completion of shell script or for execution of trap
wait "$CHILD_PID"

# cleanup is performed by trap or by shell script
cleanupTemporaryFiles "shell script" "$CHILD_PID"

exit
EOF

# run shell script in background
chmod +x $TRAP_SCRIPT
$TRAP_SCRIPT &

# wait for completion of shell script or for execution of trap
CHILD_PID="$!"
echo "waiting for completion of child process with pid $CHILD_PID"
wait "$CHILD_PID"

exit $?


Explanation:

  • Line 6 - 50: a sample shell script $HOME/test-trap.sh is created by the job shell script. This is started later on for background execution in line 54.
    • Line 11 - 23: the exitOnSigterm() function is defined. This is called if the trap is triggered - see line 36.
    • Line 25 - 33: the cleanupTermporaryFiles() function is defined that removes a temporary file that has previously been created by the job shell script.
      • This function is called by the exitOnSigterm() function of the trap in line 19. The intention is to perform a cleanup in case that a SIGTERM signal triggers the trap.
      • This function is called in line 47. It is intended for normal termination of the job shell script when no trap is triggered.
    • Line 36: a trap is defined which is triggered by a SIGTERM signal and calls the exitOnSigterm() function.
    • Line 39 - 41: the sample shell script starts a sleep command which is executed in the background and touches a temporary file which should be removed by the cleanupTermporaryFiles() function.
    • Line 44: the sample shell script waits for termination of the sleep command.
    • Line 47: the cleanupTermporaryFiles() function is called to remove temporary files in case of normal termination without the trap being triggered.
  • Line 53 - 54: the sample shell script is made executable and is started in background.
  • Line 59: the job shell scripts waits for termination of the sample shell script.

Agent Operations for Termination

In Unix environments jobs receive the following signals from the Agent:

  • When a job is to be terminated, the Agent sends a SIGTERM signal.
    • This signal can be ignored or can be handled by a job. For shell scripts a trap can be defined to, for example, perform cleanup tasks such as disconnecting from a database or removing temporary files.
  • The job configuration includes the Grace timeout setting:
    • The Grace Timeout duration is applied after a SIGTERM signal (corresponding to kill -15) has been sent by the Agent. This allows the job to terminate on its own, for example after a cleanup has been performed.
    • If the job is still running after the specified Grace Timeout duration then the Agent sends a SIGKILL signal (corresponding to kill -9) that aborts the OS process.

Job scripts frequently spawn child processes that have to be terminated in line with their parent process.

  • By default the OS terminates child processes if the parent process is terminated. However, this mechanism is not applicable for all situations, depending on the way the child processes have been spawned.
  • For details see JS7 - FAQ - How does JobScheduler terminate Jobs.

Trap Operations on Termination

It is important to keep in mind that a trap interrupts the currently executed command in a script, but does not terminate the script.

  • When the relevant OS signal is received then the current command of the job shell script is cancelled and instead the trap is executed.
    • This is why we find two trap definitions in the above example:
      • When cancelling a job then the SIGTERM signal is sent by the Agent to the process running the job shell script.
      • As the job shell script process spawns another shell script to be executed in background, the trap in line 4 of the example above is added:
        • the job shell script's trap forwards the SIGTERM signal to the sample shell script.
        • the sample shell script defines its own trap with line 36. This then receives the job shell script's signal.
  • After execution of the trap the sample shell script is resumed with the next command after line 44. 
    • The assumption is that the SIGTERM signal is received while waiting for the sleep command to be completed with line 44.
    • With the wait command being interrupted the cleanupTermporaryFiles() function is called by the sample shell script with line 47.
  • As a result the sample shell script is completed with line 49 and control is returned to the job shell script. The script continues with the line following the wait command in line 59 that basically exits the job shell script and provides the exit code of the most recently executed command.
    • This exit code is only informational as the Agent will set the job's exit code to the value 1 to indicate failure of the job independently of whether the trap has been completed successfully or not.

Resources