Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

Users frequently The underlying scenario includes that users perform patching of hosts used by a JS7 environment. This is not related to to JS7 - Patch Management of JS7 products, but to patching of a host a OS level or application level.

In many situations patching includes to reboot rebooting the host. Users would like know in advance to what extent a reboot will affect JS7 scheduling operation. This implies use of clustering for JOC Cockpit, Controller and Agents, see JS7 - Cluster Architecture: in a JS7 cluster outage of one or two hosts keeps allows to continue operation, outages of more hosts can make the cluster non-functional and can require manual intervention for automated fail-over and restart.

Examples for fatal outages in a cluster:

  • if both Primary and Secondary JOC Cockpit instances are shutdown, then the Controller Cluster will continue to work. However, fail-over and restart of Controller instances will require user intervention,.
  • if both Primary and Secondary Controller instances are shutdown, then an Agent Cluster will continue to work. However, fail-over and restart of Director Agent instances will require user intervention,.

Impact Check Script

The script makes use of the JS7 - Unix Shell CLI for JOC Cockpit Status Operations that offers the health-check command with the --whatif-shutdown option, see Examples for Health Checks.

...

  • The script is available for Linux and MacOS® using bash shell.
  • The script terminates with exit code 0 to signal that the there will not be a fatal impact of the outage host shutdown scenario, other exit codes signal fatal impact on JS7 scheduling operation.
  • The script is intended as a baseline example for customization by JS7 users and by SOS within the scope of professional services. Examples make use of JS7 Release 2.7.2, bash 4.2.

The below script checks hosts from a list - one after the next - for impact in case of shutdown impact.

  • Users can limit health checks to clustered JS7 products. Shutdown of a Standalone Agent's host always has a fatal impactresults in unavailability. Limiting health checks to clustered Agents using the --agent-cluster switch is recommended.
  • Users can improve performance
    • by checking (and later patching) more than one host at the same time, using for example: --whatif-shutdown=joc-2-0-primary,joc-2-0-secondary.
    • by executing health checsks checks for hosts in parallel.


Code Block
languagebash
#!/bin/bash

# set common options for connection to the JS7 REST Web Service
request_options=(--url=http://joc-2-0-primary.sos:7446 --user=root --password=root --ca-cert=./root-ca.crt --controller-id=controller --agent-cluster)

# hosts to be patched
hosts=(joc-2-0-primary joc-2-0-secondary controller-2-0-primary controller-2-0-secondary diragent-2-0-primary diragent-2-0-secondary)

# max. number of tries in forcase of non-fatal problems
tries=3

# delay in seconds between retries after non-fatal problems
delay=10

for host in "${hosts[@]}"; do
    echo "--------------------------------------------------------"
    echo "CHECKING SHUTDOWN IMPACT OF HOST SHUTDOWN: $host"
    echo "--------------------------------------------------------"

    try=1
    while [ "$try" -le "$tries" ]; do
        echo ""
        echo "TRY $try/$tries: ./bin/operate-joc.sh health-check "${request_options[@]}" --whatif-shutdown=$host"
        echo ""
        ./bin/operate-joc.sh health-status "${request_options[@]}" --whatif-shutdown="$host"
        rc=$?
        echo -n ""

        case "$rc" in
            0)  break;
                ;;
            3)  sleep "$delay"
                ;;
            *)  exit "$rc"
                ;;
        esac

        try=$((try+1))
    done

    if [ "$rc" -eq 0 ]
    then
        echo "PATCH CAN BE APPLIED TO HOST: $host"
        # add your code for patching
    else
        echo "PATCH CANNOT BE APPLIED TO HOST: $host, Exit Code: $rc"
        # add your code for error handling
    fi

    echo ""
done