Purpose

  • The Cluster Service is used to operate a passive cluster of JOC Cockpit instances and to perform fail-over and switch-over operations between cluster members.
  • All other JS7 - Services depend on the Cluster Service to start them for the active JOC Cockpit instance.

Fail-over

Any JOC Cockpit instances which are running and connected to the same JS7 - Database implement a passive cluster. The active cluster member runs the Cluster Service, any passive cluster members watch for the active cluster members' ongoing operation by checking its heartbeats. In case of failure of the active cluster member one of the passive cluster members will become active and will start its Cluster Service.

Switch-over

The switch-over operation is technically similar to the fail-over operation, however, switch-over is caused by the GUI or by the API and allows the Cluster Service to normally stop any background services which are running.

Configuration

Location

  • The "Settings" are built-in and cannot be modified by the JOC Cockpit GUI

Configuration Items

SectionSettingDefaultRequiredPurpose
clusterheart_beat_exceeded_interval60noThe duration in seconds to signal that heartbeats from an active cluster member to the database did not arrive in time and that fail-over to the next passive cluster member should occur.

polling_interval30noThe interval in seconds between sending heartbeats to the database.

polling_wait_interval_on_error2noThe interval in seconds to continue polling after an error has occurred, e.g. due to transactional concurrency etc.

switch_member_wait_counter_on_success10noThe number of retries to wait for the answer from the last active cluster member after its deactivation/activation.

switch_member_wait_interval_on_success5no

The maximum number of seconds to wait for a cluster member to become active.

max wait time = switch_member_wait_counter_on_success * switch_member_wait_interval_on_success +  execution time


switch_member_wait_counter_on_error10noThe maximum number of retries in case of errors, e.g. due to transactional concurrency etc., to switch the cluster to a different member.

switch_member_wait_interval_on_error2no

The maximum number of seconds to wait for a cluster member to become active after an error.

max wait time = switch_member_wait_counter_on_error*switchMemberWaitIntervalOnError+ execution time


current_is_cluster_membertruenoEnable cluster to switch to this instance.

Logging

  • The Cluster Service logs general messages, warnings and errors in the joc.log file.
  • More detailed information is additionally logged in the Main Log service-cluster.log file.
  • In addition to the Main Log, detailed debug information is logged in the Debug Log service-cluster-debug.log file.
  • For details see the JS7 - Log Files and Locations article.