Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Target Architecture

...

 

Cockpit / Master / Agent

Cockpit / Master

  • Master/Agent do not require a permanent connection
    • each component works asynchroneously and independently
    • a connection should be established at least once per day
  • Master
    • control a number of Agents
    • act as the central access point to Agents, e.g. for the GUI
    • can be terminated and restarted during ongoing operations
    • control the daily plan (calendar), what to run, when, where (but not how)
    • operate as a singular instance
      • multiple Master instances are operated independent from each other
      • multiple Agent instances can be shared across a number of Master instances
    • Agent
      • execute  tasks from the daily plan independently
      • operate in a high-availability cluster

Components

Cockpit

  • manage job configuration, optionally with a repository service
  • manage release procedure for job configuration to Master
  • accept job events and job history from Master
  • report job events to event queue
  • report job history to reporting database
  • run authentication and authorization service
  • run web server and web services
  • run JobScheduler Controller Web GUI
  • bundle a number of Masters and delegate commands to Master

Master

  • control the job plan (calendar), what to run, when and where
  • forward daily plan to Autonomous Agent Cluster
  • accept  task execution result, job history and log information from Agents
  • optionally operate in an active-passive cluster

Autonomous Agent

  • implement a fault-tolerant peer-to-peer network of Agents
  • accept daily plan from Master
  • available for active-passive and active-active clustering:
    • fixed priority scheduling
    • round-robin scheduling
  • execute job chains independently from Master availability
  • resolve more complex dependencies with Master, e.g. for checks of the job history or of external events from other machines
  • report job history and log information back to Master
  • run distributed recovery files for recovery purposes
  • allow access by a number of Master instances
  • provide resilience features for reconciliation after Master connection loss

...