Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

The Agent Cluster is designed to provide horizontal scalability and fail-over capabilities for Agents in high availability environments, see JS7 - Agent Cluster. It works without a single point of failure.

Use of a JS7 - Agent Clusteris subject to the JS7 - License.

We find two separate tiers for clustering of Agentsin the architecture of Agent Clusters, see JS7 - System Architecture:

  • Controller (Cluster) → Director Agent (Cluster)
  • Director Agent (Cluster) → Subagent Cluster

We find separate layers for operation and use of Agent Clusters:

  • Operational Layer: Subagents and Director Agent Instances
    • Subagents and Director Agent instances are similarly installed.
    • Director Agent instances orchestrate Subagents, they . They include a Subagent that can be used if users wish to execute jobs from a Director Agent.
  • Functional Layer: Subagent Cluster and Director Agent Cluster
    • Jobs are assigned Subagent Clusters to specify that the jobs can be executed by any Subagent that is a member of the Subagent Cluster. The Subagent Cluster rules if a different Subagent will be chosen in case of fail-over only (fixed-priority scheduling, active-passive cluster) or for each next execution of a job (round-robin, active-active cluster).
    • The Director Agent Cluster is independent from Subagent Clusters. The purpose of clustering is to provide high availability for the role of orchestrating Subagents.

...

The article is focused on fail-over of a Director Agent. For fail-over scenarios with Subagent Clusters see JS7 - How to perform fail-over of between Subagents in an Agent Cluster.

For command line references see the JS7 - Agent - Command Line Operation article.

Architecture

The JS7 - Agent Cluster documentation provides an in-depth explanation of the architecture. For guidance on planning Agent Clusters, refer to the JS7 - Strategies for Agent Clustering resources.

The Agent Cluster architecture comprises several facets related to clustering, which can be classified into two distinct layers:

...

  • This layer includes to set up any number of Subagent Clusters that include one or more Subagents and that specify fixed-priority or round-robin scheduling with Subagents.
  • Jobs are assigned Subagent Clusters to specify that they can be executed on any assigned Subagent. The Subagent Cluster rules if a different Subagent will be chosen in case of fail-over only (fixed-priority) or for each next execution of a job (round-robin).
  • For testing of Subagent Clusters see JS7 - How to perform fail-over of Subagents in an Agent Cluster

...

.

...

Manage Director Agent Clusters

The JS7 - Agent Installation On Premises and JS7 - Agent Installation for Containers articles explain the installation procedure that is approx. the same for Director Agents and for Subagents. Director Agent instances require a license keys to be assigned, see JS7 - How to apply a JS7 License Key.

...

  • The view is grouped in Controllers.
  • For each Controller separate lists of Standalone Agents and Cluster Agents are displayed.


Add Director Agent Cluster

The Agent Cluster is situated in the operational layer and includes specification of Director Agents.

...

Explanation:

For explanation of Input fields, see JS7 - Management of Agent Clusters.

Status of Agent Cluster

To check the Agent Cluster status users can navigate to the Resources->Agents view:

High Availability Setup

For high availability setup with two server nodes the following distribution of active and standby JS7 products should be applied:

Server 1Server 2
Active Controller Instance

Standby Controller Instance

Standby Active Director InstanceActive Director Agent Instance

Operations on Director Agent Cluster

Fail-over

Fail-over occurs when an Active Director Agent instance is terminated abnormally. Fail-over includes that the task any tasks currently being executed by the Director Agent instance is are considered to have failed and that the related order is orders are set to a the failed state. An Inactive Director Agent instance is no longer a member of the Director Agent Cluster:

  • The previous Standby Director Agent instance will take the active role.
  • Subagent Clusters will continue to execute jobs. They are not affected by a Director Agent's fail-over operation.
  • If the Agent Cluster is assigned to a File Order Source for JS7 - File Watching then the active Director Agent instance will pick up file watching. This is performed independently from the fact that the Subagent included with a Director Agent instance is enabled or disabled.

Fail-over can be caused by the following actions:

...

  • the Active Director Agent instance is stopped normally from the command line:
    • agent_<port>.sh | .cmd stop
  • the operating system is shut down and systemd / init.d or a Windows Service are in place to stop the Director Agent instance normally.
  • no Active Controller instance is running as it holds the Cluster Watch role required for fail-over.

Fail-over happens within a short period of time, typically in 2-3s.

Switch-Over

Switch-over is an operation that is caused by user intervention in JOC Cockpit or by use of the JS7 - REST Web Service API. The switch-over procedure does not require termination of an Active Director Agent, instead it shifts the active role to the standby Director Agent.

...

  • The active and standby Director Agent instances will switch roles.
  • As a prerequisite for switch-over
    • an active Controller instance has to be up and running.
    • the Director Agent Cluster has to be coupled,
    • the Subagent in a Director Agent instance must not have running run jobs.
  • After switch-over the Standby Director Agent will become active and the now standby previously active Director Agent instance will be restarted.
  • If the Agent Cluster is assigned to a File Order Source for JS7 - File Watching then the active Director Agent instance will pick up file watching. 
  • This is performed independently from the fact that the Subagent included with a Director Agent instance is enabled or disabled.

Image Added

Confirm loss of a Director Agent instance

The operation to Confirm loss of a Director Agent instance is performed in the following situation:

  • Assume that fail-over between Director Agent instances occurred. Assume that after fail-over both the Controller (Standalone Controller or Controller Cluster) and the remaining Director Agent instance are shutdown at the same point time. In this situation after restart of Controller and Director Agent the Controller cannot act as a witness to the previous Director Agent fail-over due to its own restart. As a result the Controller holding the role of the Cluster Watch cannot determine which of the newly started Director Agent instances should receive the active role as both Director Agent instances after restart will claim the active role.
  • In this situation the user is asked to decide which Director Agent instance should be considered lost. This includes to verify that the now standby Director Agent instance is shutdown at the point in time when the user takes this decision. Users can start the now standby Director Agent instance later on to re-establish the Director Agent Cluster.

Further Resources