Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Reconciliation Scenario
    • Applies after a Network Connection Loss between Master and Agent.
    • Includes retry attempts by the Master to send requests to the Agent after a connection loss. If the connection can be re-established then running tasks are continued with the Agent, otherwise running tasks are killed.
  • Objectives
    • If a Master were not available for a longer period then the Agent could not report back the execution history and log information for tasks. This would result in the fact that no information is available with the Master if the job execution has been successful or not.
    • The primary goal is to prevent duplicate simultaneous execution of jobs. Without further information from a Master the respective Agent instance cannot know if later on it will be contacted for re-execution of the same job (which would allow to continue a currently running task on an Agent) or if the Master will choose a different Agent (see RedundancyAgent Bundle).
    • The secondary goal is to support re-establishing the communication between Master and Agent and to continue running tasks. Tasks that make use of the JobScheduler API cannot run independently from the Master and are delayed within the scope of this feature.
  • Master/Agent Heartbeats

    • The Master and Agent send heartbeats to each other.
      • The Agent receives HTTP POST requests from the Master and will respond within 5s, independently from the completion of the command that has been requested by the Master.
      • The Master will repeat sending further HTTP POST requests and accepting acknowledgements until the Agent sends the final response, i.e. after completion of a task.
    • If the Agent does not receive a heartbeat from the Master within the double period (10s) then the Agent will assume the connection to be lost and will kill the task.
    • If the Master does not receive a heartbeat from the Agent then the Master will consider the task being lost and will assign the task an error state.
  • Master/Agent Reconciliation
    • For a Network Connection Loss scenario the Master and Agent provide reconciliation capabilities:
      Jira
      serverSOS JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId6dc67751-9d67-34cd-985b-194a8cdc9602
      keyJS-1524
      • The Agent will continue any running tasks up to the specified number of retry attempts to establish the communication by the Master.
        • Reconciliation will take place if the connection between Master and Agent can be established within the number of retries and if the Master has not been restarted.
        • Otherwise the Agent will assume the Master Service Failure scenario and will kill any running tasks.
          Jira
          serverSOS JIRA
          columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
          serverId6dc67751-9d67-34cd-985b-194a8cdc9602
          keyJS-1523
        • Reconciliation will take place if the connection between Master and Agent can be established within the number of retries and if the Master has not been restarted.
        • Otherwise the Agent will assume the Master Service Failure scenario and will kill any running tasks.
        •  
      • This behavior applies to tasks that are executed by an Agent for a specific Master to which a connection has been lost. Tasks for other JobScheduler Master instances will be continued.
      After connection loss the Master will regularly attempt to re-establish the HTTP connection to the Agent. This communication allows the Agent to report the execution status of running jobs back to the Master
      • .
    • After a successful re-connect within the Network Connection Loss scenario the Master will repeat its request for execution of the respective jobs. Each new request includes an identifier for the previous execution request that allows the Agent to identify repeated requests:
      • for a job that has been completed within the time required to re-establish the connection the Agent will report the execution result back to the Master and will not re-execute the job.
      • for a job that is still running the Agent will report the appropriate information back to the Master which will note the running tasks and update JOC accordingly.
  • Feature Availability
    • Display feature availability
      StartingFromRelease1.10.2

...