Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Functioning of the Watchdog (heart_beat_watchdog_thread)
A watchdog is started automatically each time a JobScheduler is started as part of a Cluster. Each watchdog runs as a seperate thread alongside its respective JobScheduler and monitors that JobScheduler's heartbeat. The Watchdog stops its JobScheduler after the JobScheduler's heartbeat has been missing for a predefined length of time.
...
This behaviour cannot be configured as it is an "emergency" procedure to ensure the reliable functioning of the cluster.
Possible reasons for a missing heartbeat
- Database problems
- Problems with the SMTP mail server
- DNS problems
- A heavily overload computer (e.g. lack of memory)
- A change in system time
Output to the log file scheduler.log
JobScheduler determines that its own heartbeat is missing 31 seconds after it was due. The warning is issued after a further delay of 3 seconds. The maximum delay that is tollerated is 55 seconds.
...
Code Block |
---|
2013-09-12 12:28:20.546 [WARN] (Cluster) SCHEDULER-827 Own heart beat is late: next_heart_beat has been announced for 2013-09-12 12:27:03 (this is 77 seconds late) |
See also
- JobScheduler Backup Cluster in which the monitoring carried out by other JobSchedulers is described in more detail.
...