You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Introduction

JobScheduler will restart jobs in a number of situations:

  • restart job after termination with error.
  • restart job after Agent restart.
  • restart job from different Subagent in an Agent Cluster in case that the Subagent running the job becomes unreachable. 

Restart Job after Failure

If a job terminates with failure, this includes that the Agent is available and is a witness to the job's failure.

For this situation users can apply the JS7 - Retry Instruction that specifies the number of tries and intervals to restart the job.

  • For Standalone Agents it will be the same Agent that restarts the job.
  • In an Agent Cluster a Subagent will be selected based on the Subagent Cluster configuration to restart the job.

If a job fails then the order is put to the failed state. While waiting for the next try in a Retry Instruction, the order will be set to the waiting state.

Restart Job after Agent Restart

If an Agent becomes unreachable while executing a job then this can mean that

  • the Agent is not running, for example due to a crash.
  • the Agent continues to run, but no connection can be established, for example in case of network errors.

In this situation

  • for Standalone Agents the Controller does not know the execution status of the job as long as the Agent is unreachable.
  • for Subagents in an Agent Cluster the Director Agent does not know the execution status of the job as long as the Subagent is unreachable.

Not knowing a job's execution status denies to restart a job in order to prevent double job execution in case that the Agent is unreachable but continues to run the job.

If the Agent is restarted after crash then it will restart any jobs that were running at the point in time when the Agent crashed. This applies to Standalone Agents and to Subagents in an Agent Cluster.

For the time that an Agent is unreachable related orders are put to the blocked state. No operation is available on such orders until the Agent can be reached.

Restart Job in Agent Cluster

In an Agent Cluster in case that a Subagent becomes unreachable there is the option to confirm loss of the Subagent and to restart jobs from a next Subagent.

  • The option should be handled with care as it can cause double job execution if the original Subagent is unreachable but is still running the job. Before using this option users should verify that the Subagent is down.
  • Jobs which are not restartable after a user confirmed loss of the Subagent can be marked as such in the job inventory. This applies to jobs that must execlude any risk of double job execution.
  • Selection of the next Subagent is based on the type of Subagent Cluster, for example fix-priority or round-robin.


JS-2141 - Getting issue details... STATUS   JS-2151 - Getting issue details... STATUS

FEATURE AVAILABILITY STARTING FROM RELEASE 2.7.3




  • No labels