Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

The Agent Cluster is designed to provide horizontal scalability and fail-over capabilities for Agents . Multiple Subagents are utilized across different servers to distribute the workload within the Agent Cluster, resulting in an active-active clustering setup. To ensure high availability and the ability to restart, automated fail-over mechanisms are employed in the Agent Cluster, creating an active-passive cluster configuration. The JS7 - License governs the usage of the Agent Cluster and must be followed accordinglyin high availability environments, see JS7 - Agent Cluster. It works without a single point of failure.

Use of a JS7 - Agent Clusteris subject to the JS7 - License.

We find two separate tiers for clustering of Agents, see JS7 - Agent Cluster:

  • Functional Layer: Subagent Cluster
    • Jobs are assigned Subagent Clusters to specify that they can be executed by any Subagent that is a member of the Subagent Cluster. The Subagent Cluster rules if a different Subagent will be chosen in case of fail-over only (fixed-priority) or for each next execution of a job (round-robin).
    • For testing of Subagent Clusters see JS7 - How to test fail-over of Subagents in an Agent Cluster.
  • Operational Layer: Director Agent Cluster
    • Director Agents orchestrate Subagents, therefore a Director Agent Cluster is independent from Subagent Clusters.

Consider the wording in this article:

  • Fail-over is an automated operation that occurs when a Director Agent instance is aborted or killed. Fail-over is applied in case of abnormal termination.
  • Switch-over is a manual operation performed by users on a Director Agent Cluster.

For command line references see the JS7 - Agent - Command Line Operation article.

Architecture

The JS7 - Agent Cluster documentation provides an in-depth explanation of the architecture. For guidance on planning Agent Clusters, refer to the JS7 - Strategies for Agent Clustering resources.

The Agent Cluster architecture comprises several facets related to clustering, which can be classified into two distinct layers:

  • Functional Layer: Subagent Cluster
    • This layer includes to set up any number of Subagent Clusters that include one or more Subagents and that specify fixed-priority or round-robin scheduling with Subagents.
    • Jobs are assigned Subagent Clusters to specify that they can be executed on any assigned Subagent. The Subagent Cluster rules if a different Subagent will be chosen in case of fail-over only (fixed-priority) or for each next execution of a job (round-robin).
    • For testing of Subagent Clusters see JS7 - How to test fail-over of Subagents in an Agent Cluster
  • Operational Layer: Director Agent Cluster
    • Director Agents orchestrate Subagents, therefore a Director Agent Cluster is independent from Subagent Clusters. Users can operate a single Director Agent instance and they can operate two Director Agent instances that act as a cluster.
    • The Controller connects to Director Agents to exchange order information. The Controller does not connect to Subagents.
    • Technically each Director Agent instance includes a Subagent that can be used in a Subagent Cluster. However, for high availability it is recommended to focus a Director Agent's role to orchestration of Subagents and not on execution of jobs in a Subagent
  • The Operational Layer of the Agent Cluster architecture involves the setup and configuration of Cluster Agents. This layer handles the installation procedure, which closely resembles that of Standalone Agents. Furthermore, it involves the crucial step of registering Agents as active members within the Agent Cluster, enabling their seamless integration and participation in various cluster operations.
  • The Functional Layer defines multiple logical Subagent Clusters. These Subagent Clusters form the basis for organizing and managing the workload distribution within the Agent
    • Cluster.

Manage Director Agent Clusters

The JS7 - Agent Installation On Premises and JS7 - Agent Installation for Containers articles explains explain the installation procedure that is approx. the same for Cluster Director Agents and for Standalone AgentsSubagents. Director Agent Director instances need individual require a license keys to be assigned, see JS7 - How to apply a JS7 License Key.

The icon in the JOC Cockpit main menu is used to select navigate to the Manage Controllers/Agents view:

Image Modified


This brings forward the following view:

  • The view is grouped in Controllers.
  • For each Controller separate lists of Standalone Agents and Cluster Agents are displayed.


Image Modified

Add Director Agent Cluster

The Agent Cluster is situated in the operational layer and includes specification of Director Agents.

To add an a Director Agent Cluster users can start from the action menu of the Controller:

Image Modified

This brings forward the following popup window:

Image Modified


Explanation:

For explanation of Input fields, see JS7 - Management of Agent Clusters.

Status

...

of Agent Cluster

To check the Status see, JS7 - Resources - Agent Job Executions.Agent Cluster status users can navigate to the Resources->Agents view:

Image Added

Operations on Director Agent Cluster

Fail-over

Fail-over occurs when an Active Director Agent instance is terminated abnormally. Fail-over includes that the task currently being executed by the Director Agent instance is considered to have failed and that the related order is set to a failed state. An Inactive Director Agent instance is no longer a member of the Director Agent Cluster:

  • The previous Standby Director Agent instance will take the active role.
  • Subagent Clusters will continue to execute jobs. They are not affected by a Director Agent's fail-over operation. 

Fail-over can be caused by the following actions:

  • The Active Director Agent instance is killed, for example:
    • for Unix with a SIGKILL signal corresponding to the command: kill -9
    • for Windows with the command: taskkill /F
  • From the command line the Agent's Instance Start Script can be used like this:
    • agent_<port>.sh | .cmd abort
    • agent_<port>.sh | .cmd kill

Fail-over will not occur when:

  • the Active Director Agent instance is stopped normally from the command line:
    • agent_<port>.sh | .cmd stop
  • the operating system is shut down and systemd / init.d or a Windows Service are in place to stop the Director Agent instance normally.

Fail-over happens within a short period of time, typically in 2-3s.Image Removed

Switch-Over

Switch-over is an operation that is caused by user intervention in JOC Cockpit or by use of the JS7 - REST Web Service API. The switch-over procedure does not require termination of an Active Director Agent, instead it shifts the active role to the standby Director Agent.

In the JS7 - Resources- Agent Job Executions the user performs one of the operations from Cluster action menu:>Agents view users can perform the switch-over operation from the Agent Cluster's action menu:

  • The active and standby Director Agent instances will switch roles.
  • As a prerequisite for switch-over
    • the Director Agent Cluster has to be coupled,
    • the Subagent in a Director Agent instance must not have running jobs.
  • After switch-over the Standby Director Agent will become active and the now standby Director Agent instance will be restarted.Switch-over

Confirm loss of a Director

...

Test fail-over of Agent

Agent instance

The operation to Confirm loss of a Director Agent instance is performed in the following situation:

  • Assume that fail-over between Director Agent instances occurred. Assume that after fail-over both the Controller (Standalone Controller or Controller Cluster) and the remaining Director Agent instance are shutdown at the same point time. In this situation after restart of Controller and Director Agent the Controller cannot act as a witness to the previous Director Agent fail-over due to its own restart. As a result the Controller holding the role of the Cluster Watch cannot determine which of the newly started Director Agent instances should receive the active role as both Director Agent instances after restart will claim the active role.
  • In this situation the user is asked to decide which Director Agent should be considered lost. This includes to verify that the now standby Director Agent instance is shutdown at the point in time when the user takes this decision. Users can start the now standby Director Agent instance later on to re-establish the Director Agent Cluster.

Further Resources

...

...