Purpose
- The Cluster Service is used to operate a passive cluster of JOC Cockpit instances and to perform fail-over and switch-over operations between cluster members.
- All other JS7 - Services depend on the Cluster Service to start them for the active JOC Cockpit instance.
Fail-over
Any JOC Cockpit instances which are running and connected to the same JS7 - Database implement a passive cluster. The active cluster member runs the Cluster Service, any passive cluster members watch for the active cluster members' ongoing operation by checking its heartbeats. In case of failure of the active cluster member one of the passive cluster members will become active and will start its Cluster Service.
Switch-over
The switch-over operation is technically similar to the fail-over operation, however, switch-over is caused by the GUI or by the API and allows the Cluster Service to normally stop any background services which are running.
Configuration
Location
- The "Settings" are built-in and cannot be modified by the JOC Cockpit GUI
Configuration Items
Section | Setting | Default | Required | Purpose |
---|---|---|---|---|
cluster | heart_beat_exceeded_interval | 60 | no | The duration in seconds to signal that heartbeats from an active cluster member to the database did not arrive in time and that fail-over to the next passive cluster member should occur. |
polling_interval | 30 | no | The interval in seconds between sending heartbeats to the database. | |
polling_wait_interval_on_error | 2 | no | The interval in seconds to continue polling after an error has occurred, e.g. due to transactional concurrency etc. | |
switch_member_wait_counter_on_success | 10 | no | The number of retries to wait for the answer from the last active cluster member after its deactivation/activation. | |
switch_member_wait_interval_on_success | 5 | no | The maximum number of seconds to wait for a cluster member to become active. max wait time = switch_member_wait_counter_on_success * switch_member_wait_interval_on_success + execution time | |
switch_member_wait_counter_on_error | 10 | no | The maximum number of retries in case of errors, e.g. due to transactional concurrency etc., to switch the cluster to a different member. | |
switch_member_wait_interval_on_error | 2 | no | The maximum number of seconds to wait for a cluster member to become active after an error. max wait time = switch_member_wait_counter_on_error*switchMemberWaitIntervalOnError+ execution time | |
current_is_cluster_member | true | no | Enable cluster to switch to this instance. |
Logging
- The Cluster Service logs general messages, warnings and errors in the
joc.log
file. - More detailed information is additionally logged in the Main Log
service-cluster.log
file. - In addition to the Main Log, detailed debug information is logged in the Debug Log
service-cluster-debug.log
file. - For details see the JS7 - Log Files and Locations article.