Scope
- Fault Tolerance, Resilience and Redundancy provide high-availability of JobScheduler for a number of outage scenarios:
- High Availability requires the system including JobScheduler, database, storage etc. to be available, not just one component.
- High Availability is oriented towards specific outage scenarios, not towards any possible failure.
- Master / Agent Resilience includes a number of measures for operational robustness:
- Master / Agent Reconciliation allows continued execution of tasks in case of recoverable Network Connection Loss.
- Master Service Recovery includes supported measures after a Master Service Failure.
- Database Service Recovery includes the capability to recover in case of Database Connection Loss.
- Master / Agent Redundancy includes a number of architecture decisions:
- Master Clusters provide redundancy of Master instances in a network.
- Agent Clusters can be used to compensate the outage of a server that runs an Agent.
- Recovery Strategies provide an overview of means how to restore the scheduling service
Master / Agent Redundancy
Master Cluster
Feature
- JobScheduler Master supports Cluster Operation with redundancy of the involved cluster members.
- Clustering is frequently used for high-availability and in some cases for improved performance.
Agent Cluster
Feature
- The JobScheduler allows multiple Agents to be specified for a single Process Class.
- The JobScheduler Master contacts Agents in round-robin mode:
- JS-1188Getting issue details... STATUS- the first Agent that is configured to execute jobs for the process class will be contacted.
- if the first Agent is not available then the next Agent listed in the process class configuration will be contacted
- this procedure will be repeated until an Agent is found that can execute the job.
- Use cases for this scenario include
- all Agents running on different server nodes: the switch to the next available Agent implements a fail-over to the next server node.
- a number of Agents running on the same server node: the switch to the next available Agent implements redundancy of Agents within a single server node.
- The JobScheduler Master contacts Agents in round-robin mode:
- Feature Availability
- FEATURE AVAILABILITY STARTING FROM RELEASE 1.9
Delimitation
- This feature is not intended for load sharing as the JobScheduler will always use the first available Agent.
- This feature is not intended for scalability as it does not allow the execution of jobs in parallel on a number of Agents (clustering).
Change Management References