Introduction

  • The JS7 Controller and Agent make use of a journal to store infrastructure information and transactional data.
    • The journal is located in the <data>/state directory of the Controller's or Agent's data directory.
    • The journal consists of a number of files.
  • There can be situations when changes to a journal are required:
    • Examples
      • If users find inconsistent journals due to problems with the storage layer.
      • If users find orders holding an inconsistent state that cannot be cancelled from the JOC Cockpit GUI.
      • If users would not properly decomission an Agent but drop the Agent's machine. In this situation the Controller holds the information about the unreachable Agent. 
    • Changes to a journal are considered critical and must be performed with care.
  • The proceeding includes
    • to shutdown the Controller or Agent instance for which the journal should be updated,
    • to take a backup of the journal,
    • to apply changes to the journal by use of the Journal Update Script,
    • to start the Controller or Agent instance.

Journal Update Script

The Journal Update Script is provided for download and can be used to update Controller and Agent journals.

  • The script is available for Linux, MacOS®, AIX®, Solaris® using bash, dash, ksh and zsh POSIX-compatible shells.
  • The script is intended as a baseline example for customization by JS7 users and by SOS within the scope of professional services.

Download

Download: update-journal.sh

Make the script executable using the command: chmod +x ./update-journal.sh

Usage

Invoking the script without arguments displays the usage clause:

Usage
Usage: update-journal.sh [Options] [Switches]

  Options:
    --controller-id=<identifier>          | required: Controller ID for which the journal is updated, default: controller
    --journal-dir=<directory>             | required: directory that holds the journal, default: .
    --backup-dir=<directory>              | required: directory to hold journal backups, default: /tmp
    --agent-id=<agent-id>[,<agent-id>]    | optional: Agent ID to be removed from a Controller's journal
    --order-id=<order-id>[,<order-id>]    | optional: Order ID to be removed from a Controller's or Agent's journal
    --workflow=<name>[,<name>]            | optional: Workflow to be removed from a Controller's or Agent's journal

    Switches:
    -h | --help                           | displays usage
    -a | --agent                          | specifies that an Agent's journal will be updated
    -c | --controller                     | specifies that a Controller's journal will be updated
    -k | --check                          | checks if the journal includes the indicated Agent ID and Order ID
    -r | --remove                         | removes the indicated Agent ID or Order ID from the journal
    -p | --problem                        | removes problems from a Controller journal

Options

  • --controller-id
    • Specifies the Controller ID for which changes are applied.
    • The Controller ID must be specified. This applies to use of a Controller's or Agent's journal.
  • --journal-dir
    • Specifies the directory in which a Controller's or Agent's journal is available. Typically the state sub-directory of the data directory is used.
    • Permissions to read and to write to the directory and files are required.
  • --backup-dir
    • Specifies an existing directory to which backups of journal files will be added. Write permissions for the directory are required.
    • A sub-directory in the indicated directory will be created following the scheme: <backup-directory>/update-journal.<agent|controller>.<host>.<yyyy-MM-ddThh-mm-ss>
    • Example: /var/backups/js7/update-journal.controller.centostest_primary.2023-12-06T02-14-23
  • --agent-id
    • Specifies the Agent ID of an Agent that should be removed from a Controller's journal.. More than one Agent ID can be specified separated by comma.
    • Removing an Agent ID from the journal will remove any orders and workflows related to the given Agent from the same.
      • Workflows are removed only if the first job of the workflow is assigned the Agent to be removed.
      • To remove workflows holding later jobs assigned the specified Agent use the --workflow option.
    • One of the options --agent-id, --order-id, --workflow or the switch --problem has to be specified.
  • --order-id
    • Specifies the Order ID that should be removed from the journal. More than one Order ID can be specified separated by comma.
    • One of the options --agent-id, --order-id, --workflow or the switch --problem has to be specified.
  • --workflow
    • Specifies the workflow that should be removed from the journal. More than one workflow can be specified separated by comma.
    • This option expects the name of a workflow, not its path. Regular expression syntax can be used to specify a number of workflows, for example:
      • my-workflow.*   removes any workflows with a name starting with my-workflow followed by any characters (right truncation).
      • .*my-workflow   removes any workflows with a name starting with any characters ending with my-workflow (left truncation).
      • my-.*workflow   removes any workflows with a name starting with my- followed by any characters and ending with workflow.
    • One of the options --agent-id, --order-id, --workflow or the switch --problem has to be specified.

Switches

  • -h | --help
    • Displays usage.
  • -a | --agent
    • Specifies that an Agent journal will be updated.
    • Consider to apply updates to an Agent journal to both Director Agent instances if an Agent Cluster is used.
  • -c | --controller
    • Specifies that a Controller journal will be updated.
    • Consider to apply updates to a Controller journal to both Controller instances if a Controller Cluster is used.
  • -k | --check
    • Checks if the indicated Agent or Order ID is available from the journal. The operation does not modify the journal.
  • -r | --remove
    • Removes the indicated Agent or Order ID from the journal.
  • -p | --problem
    • Removes problems from the Controller journal. Such problem can include to confirm loss of an instance in a Controller Cluster or loss of an instance in an Agent Cluster.

Scenarios

For any scenarios a restart of JOC Cockpit is required as the Proxy Service might hold information about related Agents and Order IDs in its cache.

Removing an Order or Workflow

The scenario includes that changes from the Update Journal Script are applied:

  • to the Controller.
    • If a Controller Cluster is used then changes have to be applied to both Active and Standby Controller instances.
  • to the Agent:
    • The same changes as to the Controller have to be applied to Standalone Agents and Cluster Agents.
    • If an Agent Cluster is used then changes have to be applied to both Active and Standby Director Agent instances.

Removing an Agent

If an Agent should be decomissioned, then the standard proceeding includes the steps explained from the JS7 - How to take an Agent out of Operation article.

If the standard proceeding has not been performed while the Agent was running & connected, and instead the Agent's server or installation directory have been removed then changes to the Controller's journal are required in order to remove the Agent from the JS7 inventory.

The scenario includes that changes are applied: to the Controller by the Update Journal Script. If a Controller Cluster is used, then changes have to be applied to both Active and Standby Controller instances.

Remove a Problem

The Controller keeps track of problems such as missing confirmation for loss of a node in a Controller Cluster or Agent Cluster. Such confirmations can be provided by the JOC Cockpit GUI. Alternatively the Journal Update Script allows to drop such problems from a Controller's journal.

Removal of a problem should be performed for the journal of the active Controller instance in a Controller Cluster. The journal directory should then be copied to the standby Controller instance.

Fallback

To revert changes by the Update Journal Script apply the following procedure:

  • remove the contents of the state sub-directory that holds the updated journal,
  • copy from the backup directory for a Controller: 
    • cp -P /var/backups/js7/update-journal.controller.centostest_primary.2023-12-06T02-14-23/* /var/sos-berlin.com/js7/controller/state
  • copy from the backup directory for an Agent:
    • cp -P /var/backups/js7/update-journal.agent.centostest_primary.2023-12-06T04-12-53/* /var/sos-berlin.com/js7/agent/state

The journal directory holds symlinks:

  • the symlinks controller-journal or agent-journal will be included in backups.
    • The symlinks point to the journal file with the earlierst timestamp in its file name.
    • It is important that the command option cp -P is used to preserve symlinks when copying from a backup directory.
  • For your information: just in case that a symlink would have to be recreated for the Controller journal use the commands:
    • cd /var/sos-berlin.com/js7/controller/state
    • ln -s -r controller--1701814807215000.journal controller-journal
  • For your informaiton:. just in case that a symlink would have to be recreated for the Agent journal use the commands:
    • cd /var/sos-berlin.com/js7/agent/state
    • ln -s -r agent--1701814807215000.journal agent-journal

Examples

The following examples illustrate typical use cases.

Remove Orders

Check if Order is available from Controller and Agent Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --order-id="#2023-11-28#T19891282200-root" \
    --controller \
    --check

# run on Agent Server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/agent/state \
    --order-id="#2023-11-28#T19891282200-root" \
    --agent \
    --check

# checks if the indicated order is available from the Controller's and Agent's journal on the respective servers
# specifies the path to the journal directory and the quoted Order ID

Remove Order from Controller and Agent Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --backup-dir=/var/backups/js7 \
    --order-id="#2023-11-28#T19891282200-root,#2023-11-30#T37072277904-root" \
    --controller \
    --remove

# run on Agent Server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/agent/state \
    --backup-dir=/var/backups/js7 \
    --order-id="#2023-11-28#T19891282200-root" \
    --agent \
    --remove

# removes the indicated orders from the Controller's and Agent's journal on the respective servers
# specifies the path to the journal directory
# creates backups of journal files in the related backup directories
# specifies a quoted, comma separated list of Order IDs to be removed

Remove Workflows

Check if Workflow is available from Controller Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --workflow=some-workflow \
    --controller \
    --check

# checks if the indicated workflow is available from the journal
# specifies the path to the journal directory and the workflow to be removed

Remove Workflow from Controller Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --backup-dir=/var/backups/js7 \
    --workflow=some-workflow,some-other-workflow \
    --controller \
    --remove

# removes the indicated workflows from the Controller's journal
# specifies the path to the journal directory 
# specifies a comma separated list of workflows to be removed
# creates a backup of journal files in the indicated directory

Remove Agent

Check if Agent is available from Controller Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --agent-id=agent_001 \
    --controller \
    --check

# checks if the indicated Agent is available from the journal
# specifies the path to the journal directory and the Agent ID to be removed

Remove Agent from Controller Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --backup-dir=/var/backups/js7 \
    --agent-id=agent_001,agent_002 \
    --controller \
    --remove

# removes the indicated Agents, related orders and workflows from the Controller's journal
# specifies the path to the journal directory 
# specifies a comma separated list of Agent IDs to be removed
# creates a backup of journal files in the indicated directory

Remove Problem

Remove problem from Controller Journal

Example for Update of Journal
# run on Controller server
./update-journal.sh \
    --controller-id=controller \
    --journal-dir=/var/sos-berlin.com/js7/controller/state \
    --controller \
    --remove  \
    --problem

# Removes a problem from the Controller journal
# specifies the path to the journal directory

Resources