Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Primary and Secondary Controller instances require JOC Cockpit or an Agent to act as Cluster Watch, i.e. as an arbitrator in case of fail-over and switch-over.

...

  • If JOC Cockpit is assigned the Cluster Watch role then fail-over capabilities of JOC Cockpit apply.
  • If an Agent is assigned the Cluster Watch role (available for earlier releases of JS7 until branch 2.5) then the above explanations suggest that the Agent should Agent should never be run on the hosts that the Primary and Secondary Controller instances are operated on.

...

  • If the Cluster Watch  is terminated at the same time as a failed Active Controller Instance then no fail-over can occur.
  • If the Cluster Watch is terminated at the same time as one of the Controller instances then the Controller Cluster cannot start up as this requires operational readiness of the Cluster Watch.
  • A Cluster Watch that is started after failure of the Active Controller Instance is disqualified from casting its vote as it has no knowledge of whether the Controller instances' journals are in sync.

High Availability Setup

For high availability setup with two server nodes the following distribution of active and standby JS7 products should be applied:

Server 1Server 2
Active JOC Cockpit Instance

Standby JOC Cockpit Instance

Standby Controller InstanceActive Controller Instance

Cluster Operations

Cluster operations include an automated fail-over and a manual switch-over of the Active Controller Instance.

...

  • The Active Controller Instance is killed, for example:
    • on Unix with a SIGKILL signal corresponding to the command: kill -9
    • on Windows with the command: taskkill /F
  • The operating system crashes.
  • In the JS7 - Dashboard the user performs one of the operations: 
    • Active Controller Instance action menu: Abort -> With fail-over
    • Active Controller Instance action menu:Abort and restart -> With fail-over



  • From the command line the user performs one of the operations:
    • controller_instance.sh | .cmd abortkill
    • controller_instance.sh | .cmd killabort

Fail-over will not occur when:

  • the Active Controller Instance is stopped normally from the command line:
    • controller_instance.sh | .cmd stop
  • the Active Controller Instance is restarted normally from the command line:
    • controller_instance.sh | .cmd restart
  • the operating system is shut down normally and systemd / init.d or a Windows Service are in place to stop the Controller normally.
  • the Active JOC Cockpit Instance is not running as it holds the Cluster Watch role that is required for fail-over.

Fail-over happens within a short period of time, typically in 2-3s.

...

  • In the JS7 - Dashboard the user performs one of the operations: 
    • Active Controller Instance action menu: Terminate -> With switch-over
    • Active Controller Instance action menu:Terminate and restart -> With switch-over
    • Cluster action menu: Switch-over
  • From the command line the user performs the operation:
    • controller_instance.sh | .cmd switch-over




Switch-over will not occur when:

  • the Active Controller Instance is stopped normally from the command line:
    • controller_instance.sh | .cmd stop
  • the Active Controller Instance is restarted normally from the command line:
    • controller_instance.sh | .cmd restart
  • the operating system is shut down normally and systemd / init.d or a Windows Service are in place to stop the Controller normally.

...

The best advice is not to apply automated clustering mechanisms, but to perform manual failswitch-over. Reasons include but are not limited to the following issues:

...