Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Users can limit health checks to clustered JS7 products. Shutdown of a Standalone Agent's host always has results in unavailability. Limiting health checks to clustered Agents using the --agent-cluster switch is recommended.
  • Users can improve performance
    • by checking (and later patching) more than one host at the same time, using for example: --whatif-shutdown=joc-2-0-primary,joc-2-0-secondary.
    • by executing health checks for hosts in parallel.


Code Block
languagebash
linenumberstrue
#!/bin/bash

# set common options for connection to the JS7 REST Web Service
request_options=(--url=http://joc-2-0-primary.sos:7446 --user=root --password=root --ca-cert=./root-ca.crt --controller-id=controller --agent-cluster)

# hosts to be patched
hosts=(joc-2-0-primary joc-2-0-secondary controller-2-0-primary controller-2-0-secondary diragent-2-0-primary diragent-2-0-secondary)

# max. number of tries in case of non-fatal problems
tries=3

# delay in seconds between retries after non-fatal problems
delay=1015

for host in "${hosts[@]}"; do
    echo "--------------------------------------------------------"
    echo "CHECKING IMPACT OF HOST SHUTDOWN: $host"
    echo "--------------------------------------------------------"

    try=1
    while [ "$try" -le "$tries" ]; do
        echo ""
        echo "TRY $try/$tries: ./bin/operate-joc.sh health-check "${request_options[@]}" --whatif-shutdown=$host"
        echo ""
        ./bin/operate-joc.sh health-status "${request_options[@]}" --whatif-shutdown="$host"
        rc=$?
        echo -n ""

        case "$rc" in
            0)  break;
                ;;
            3)  sleep "$delay"
                ;;
            *)  exit "$rc"
                ;;
        esac

        try=$((try+1))
    done

    if [ "$rc" -eq 0 ]
    then
        echo "PATCH CAN BE APPLIED TO HOST: $host"
        # add your code for patching
    else
        echo "PATCH CANNOT BE APPLIED TO HOST: $host, Exit Code: $rc"
        # add your code for error handling
    fi

    echo ""
done


Explanations:

  • Line 7: specifies the list of hostnames used by clustered JS7 products
  • Line 10: specifies the maximum number of tries to perform the health-check. After reboot of a host it can take a few seconds until a cluster is re-established. 
  • Line 13: specifies the delay between tries. The value should be adjusted if it takes the cluster more time to recouple.
  • Line 29-36: evaluates the health check result,
    • exit code 0 signals an operational cluster,
    • exit code 3 signals that the cluster is not (yet) functional,
    • other exit codes signal component status errors, for example an unavailable Agent.