Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Fail-over is an automated operation that occurs when the Primary Controller JOC Cockpit is aborted or killed. Fail-over is applied in case of normal and abnormal termination only.
  • Switch-over is an operation that is caused by user intervention in JOC Cockpit or by use of the JS7 - REST Web Service API. The procedure includes normal termination of an Active Controller JOC Cockpit Instance.

...

JOC Cockpit plays a limited role when it comes to high availability:

  • Any workflows and jobs are executed with the Controller and Agents only. Orders are started and executed independently from JOC Cockpit once they are submitted by the JS7 - Daily Plan.
  • In case that JOC Cockpit becomes unavailable
    • users cannot control the status of Controllers and Agents,
    • users cannot deploy scheduling objects to Controllers and Agents,
    • users cannot access the JS7 - History for execution of orders and tasks.

For command line references see the JS7 - Controller JOC Cockpit - Command Line Operation article.

Cluster Roles

The documentation frequently indicates a Primary JOC Cockpit Instance and any number of Secondary JOC Cockpit Instances. The names suggest that one JOC Cockpit Instance is primarily used and other instances are used for backup purposes.

  • The wording in cluster terms suggests to indicate the Active JOC Cockpit Instance and the Standby JOC Cockpit Instances independently from the fact if the Primary or a Secondary JOC Cockpit Instance is active.
  • JOC Cockpit implements an active-passive cluster, however, the term passive is misleading as the Standby JOC Cockpit Instances are not necessarily passive:
    • Users can login to a Standby JOC Cockpit Instance to perform any operations as available from the Active JOC Cockpit Instance.
    • The following limitations apply to use of Standby JOC Cockpit Instances:
      • The JS7 - Dashboard does not automatically update order states, history information and cluster status information. Such information is updated when navigating to the page.
      • No running log is available for jobs that are currently executed. In addition, no access to order logs and task logs is available for the time that an order is not completed. 
  • Any JOC Cockpit instances operated as a cluster use the same JS7 - Database.
    • The database is used to synchronize cluster operations. 
    • Unavailability of the database prevents the JOC Cockpit cluster from working.

Cluster Operations

Cluster operations include an automated fail-over and a manual switch-over of the Active JOC Cockpit Instance.

Fail-over

Fail-over occurs when the Active JOC Cockpit Instance is terminated abnormally. 

Fail-over can be invoked by the following actions:

  • The Active JOC Cockpit Instance is killed, for example
    • for Unix with a SIGKILL signal corresponding to the command: kill -9
    • for Windows with the command: taskkill /F
  • The operating system crashes.

Fail-over happens within a period of time that depends on the number of orders that started but that did not yet complete. The minimum duration of a fail-over is 30s and can last for 2-3 minutes if thousands of active orders are in place.

Switch-over

Switch-over occurs exclusively when invoked by user intervention.

Switch-over can be invoked by the following actions:

  • In the JS7 - Dashboard the user performs the operation: 
    • Standby JOC Cockpit Instance action menu: Switch-over
  • the Active JOC Cockpit Instance is stopped normally from the command line:
    • jetty.sh | .cmd stop
  • the operating system is shut down and systemd / init.d or a Windows Service are in place to stop the JOC Cockpit normally.

Switch-over happens within a similar period of time as fail-over.

Further Resources

...