Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Jira
serverSOS JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1524

Concepts

  • Anchor
    heartbeat_period
    heartbeat_period
    Heartbeat Period: 
    • The period after which the Agent sends a heartbeat to Master should no other HTTP operation on behalf of the Master be executed.
    • Default: 10s
  • Anchor
    heartbeat_timeout
    heartbeat_timeout
    Heartbeat Timeout: 
    • The overall timeout that determines if a connection is considered to be lost permanently.
    • Includes the heartbeat period and the delay after which the Master will send its heartbeat.
    • Default: 60s
  • Anchor
    heartbeat_delay
    heartbeat_delay
    Heartbeat Delay:
    • The time that the Master waits for before it receives the Agent's heartbeat.
    • Value: 2s
    • This is fixed parameter and can not be customized.

Behavior

Let's suppose a an existent connection between a Master and an Agent. The Master and the Agent will behave as follows:

  • In case where there is no connection loss:
    • the Master sends a HTTP Request to the Agent
    • the Agent sends to the Master
      • a heartbeat after 10s to the Master should no other HTTP operation on behalf of the Master be executed.
      • a HTTP response when an operation is executed on behalf of the Master.
  • In case the of connection is lost loss after the Master has sent a first HTTP Request:
    • the Master waits 12s for the heartbeat from the Agent to arrive.
      • The Agent should answer with the a heartbeat after 10s. This is the Hearbeat Period specified above.
      • The Master waits 2s more just in case - this is the Heartbeat Delay mentioned  specified above. 
    • If a heartbeat from the Agent came within 12sbetween 10s and 12s (=10s Heartbeat Period + 2s Heartbeat Delay), any running tasks will be continued and completed by the Agent.
    • OtherwiseIf the Master did not receive the heartbeat from the Agent after 12s, the Master repeats will repeat the first HTTP Request sent 12s before and repeats this action ago until the Agent is able to answer
      • If the Agent is able to answer before 60s effected - that is, 48s after the HTTP Request repeat, any running tasks will be continued and completed by the Agent. Even though there were more HTTP Requests from the Master, the tasks will be executed just once.
      • If the Agent is not able to answer before 60s effected - that is, 48s after the HTTP Request repeat, the Master will kill any running tasks on the Agent.

...

  • the Master could kill any running task on the Agent if the connection loss exceeded 48s. This limit case would happen if the connection loss takes place exactly when the Master should receive the Agent's heartbeat.
  • the Master will always kill any running task if the connection loss exceeds 60s. This is the defined Hearbeat Timeout specified above.

Use Case

Kill Tasks in case of Connection Loss

...