Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Resilience includes support for different outage scenarios with automated and manual fail-over.
  • Outage Scenarios
    • Network Connection Loss
      • A recoverable, temporary short-term connection loss between Master and Agent that will retry attempts to re-establish the connection for a configurable period of time, e.g. 20s.number of times.
      • Connection loss includes that the JobScheduler Master and Agent have no knowledge from the beginning if the connection failed or if a Master Service Failure occurred.
      • This scenario is intended for a connection failure that can be recovered by re-establishing a connection, is is not intended for an on-going network outage. 
    • Master Service Failure
      • Either an unrecoverable connection loss a loss of the connection between Master and Agent that takes more time than the period cannot be re-established within the number of retry attempts specified for the Network Connection Loss scenario
        • due to a server crash or
        • due to a JobScheduler Master crash.
      • Or an unplanned JobScheduler Master restart or server restart.
    • Database Connection Loss
      • A recoverable, temporary short-term connection loss beetween Master and database:
        • for a JobScheduler Active Cluster this scenario includes a period of not more than 50s during which a cluster member tries to re-establish the connection.
        • for a JobScheduler Passive Cluster this scenario includes no restriction of duration, it can be configured to retry attempts to re-connect to the database endlessly.
      • Connection loss includes that the JobScheduler Master has no knowledge if the database service failed or if the connection failed.

Master / Agent Reconciliation

Scenario

  • Outages Outage Scenario
    • Network Connection Loss
      • A recoverable, temporary short-term connection loss between Master and Agent that will retry attempts to re-establish the connection for a configurable period of time, e.g. 20snumber of times.
  • Supported Scenario
    • Master/Agent Reconciliation addresses the Network Connection Loss scenario, not the Master Service Failure and Database Connection Loss scenarios.

...

  • Reconciliation Scenario
    • applies after a Network Connection Loss between Master and Agent.
    • includes re-establishing the normal relationship between Master and Agent after a connection outageloss for a number of times.
  • Agent Behavior
    • By default an Agent will kill any running tasks immediately if the connection to the Master gets lost, i.e. none of the above scenarios is supported (JS-1523). The reasons for this are:
      • If a Master were not available for a longer period then the Agent could not report back the execution history and log information for tasks. This would result in the fact that no information is available with the Master if the job execution has been successful or not.
      • The primary goal is to prevent duplicate simultaneous execution of jobs. Without further information from a Master the respective Agent instance cannot know if later on it will be contacted for re-execution of the same job (which would allow to continue a currently running task on an Agent) or if the Master will choose a different Agent (see AvailabilityAgent Bundle).
    • With a Network Connection Loss setting configured with the Agent's process class the Agent will show the following behavior (JS-1524):
      • During For the period number of times specified for the tolerated unsuccessful connection loss duration attempts the Agent will assume the Network Connection Loss scenario.
      • The Agent will continue any running tasks up to the end of the tolerated connection loss periodspecified retry attempts to establish the connection with the Master.
        • Reconciliation will take place if the connection between Master and Agent can be re-established during the connection loss period within the number of retries and if the Master has not been restarted.
        • Otherwise the Agent will assume the Master Service Failure scenario and will kill any running tasks.
      • This behavior applies to tasks that are executed for a specific Master for to which a connection has been lost. Tasks for other JobScheduler Master instances will be continued.
  • Master/Agent Reconciliation
    • After connection loss the Master will regularly attempt to re-establish the HTTP connection to the Agent. This communication allows the Agent to report the execution status of running jobs back to the Master.
    • After a successful re-connect within the Network Connection Loss scenario the Master will repeat its request for execution of the respective jobs. Each new request includes an identifier for the previous execution request that allows the Agent to identify repeated requests:
      • for a job that has been completed within the time required to re-establish the tolerated connection loss period the Agent will report the execution result back to the Master and will not re-execute the job.
      • for a job that is still running the Agent will report the appropriate information back to the Master which will note the running tasks and update JOC accordingly.
  • Delimitation
    • This feature is not intended to support a Master Service Failure scenario or Database Connection Loss scenario.
  • Feature Availability
    • Display feature availability
      StartingFromRelease1.10.2

...

  • Outage Scenario
    • Master Service Failure
      • an unrecoverable connection loss a loss of the connection between Master and Agent that takes longer than the period cannot be re-established within the number of retry attempts specified for the Network Connection Network Connection Loss scenario  scenario (see Master / Agent Reconciliation) or
      • a JobScheduler Master restart or server restart.
  • Supported Scenario
    • Master Service Recovery addresses the Master Service Failure scenario, not any scenario for temporary for Network Connection Loss or Database Connection Loss.

...