Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Functioning of the Watchdog (heart_beat_watchdog_thread)
...
JobScheduler determines that its own heartbeat is missing 31 seconds after it was due. The warning is issued after a further delay of 3 seconds. The maximum delay that is tollerated is 55 seconds.
Code Block |
---|
2013-09-12 12:26:18.230 [WARN] (Cluster)
SCHEDULER-827 Own heart beat is late: next_heart_beat has been announced for 2013-09-12 12:25:47
(this is 31 seconds late)
|
This message comes from the Watchdog. It has realsed that the last heartbeat from its JobScheduler was 136 seconds ago. The tollerance of 55 seconds delay (115s since the last heartbeat) has been exceeded.
The Watchdog terminates the JobScheduler before another cluster member can take over operation to ensure that the JobScheduler taking over does not end up, possibly later, operating in parallel with the original JobScheduler.
Code Block |
---|
2013-09-12 12:28:19.393 [ERROR] (Heart_beat_watchdog_thread)
SCHEDULER-386 Last heart beat was 2013-09-12 12:26:03, 136 seconds ago. Something is delaying
Scheduler execution, the Scheduler is aborted immediately
|
The following message will be issued if a JobScheduler should realise it is late and manage to send out a heartbeat during the termination process:
Code Block |
---|
2013-09-12 12:28:20.546 [WARN] (Cluster)
SCHEDULER-827 Own heart beat is late: next_heart_beat has been announced for 2013-09-12 12:27:03
(this is 77 seconds late)
|
...