Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Use of a Controller Cluster provides high availability and is a feature subject to the JS7 - License.

  • Fail-over is an automated operation that occurs when the Primary Controller is aborted or killed. Fail-over is applied in case of abnormal termination only.
  • Switch-over is an operation that is caused by user intervention in JOC Cockpit or by use of the JS7 - REST Web Service API. The procedure includes normal termination of an Active Controller Instance.

...

  • The wording in cluster terms suggests to indicate the Active Controller Instance and the Standby Controller Instance independently from the fact if the Primary or Secondary Controller Instance is active.
  • A Controller implements an active-passive cluster, however, the term passive is misleading as the Standby Controller Instance is not passive at all but records any order state transitions occurring in the Active Controller Instance. Both Controller instances hold a journal of order state transitions that is actively synchronized. Fail-over and switch-over will occur only if both Controller instanceinstances' s journals are in sync.
  • The Cluster presents itself as a single unit to the outside world, i.e. to JOC Cockpit and to Agents.
    • Any operations performed in JOC Cockpit are automatically applied to the Active Controller Instance.
    • At any point in time only one Controller instance is active and the other instance is in standby mode.

...

Primary and Secondary Controller instances require a single dedicated Standalone Agent to be available that acts as an arbitrator in case of fail-over and switch-over.

...

  • The Cluster Watch Agent knows immediately when the Active Controller Instance is down due to a connection loss from this instance.
  • The Standby Controller Instance holds a connection to the Active Controller Instance and knows immediately when this connection is lost.
  • Failure of the Active Controller Instance is the point in time when the Standby Controller Instance and the Cluster Watch Agent check to find common ground about a cluster fail-over operation: They determine if they should declare the Active Controller Instance being inoperable and after a short period of 21-3s 2s they proceed and cast their votes if the Standby Controller Instance should now become the Active Controller Instance.
  • As a pre-requisite prerequisite for fail-over both the Cluster Watch Agent and the Standby Controller Instance have to confirm that the Standby Controller Instance's journal was in sync with the Active Controller Instance at the point in time of failure.

...

Above explanations suggest that a Cluster Watch Agent may never be running on the hosts that the Primary and Secondary Controller instances are operated for.

  • If the Cluster Watch Agent is terminated at the same time as a failed Active Controller Instance then no fail-over can occur.
  • If the Cluster Watch Agent is terminated at the same time as one of the Controller instances then the Controller Cluster cannot start up as this requires operational readiness of the Cluster Watch Agent.
  • A Cluster Watch Agent that is started after failure of the Active Controller Instance is disqualified from casting its vote as it has no knowledge if the Controller instances' journals are  in sync.

...