Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

This example uses a simple job chain which starts shell jobs to demonstrate the different behaviors that can be configured for JobScheduler if an error occurs in one of the jobs.

In particular, the effect of the stop_on_error and on_error parameters is demonstrated along with the use of suspended orders and setbacks to retry running a job.

Downloads

Instructions

Behavior with stop_on_error="no"

  • Unzip all files in the download into the ./config/live folder of your JobScheduler installation.
  • Open the JobScheduler Operating Center, JOC, in your browser using http://scheduler_host:scheduler_port
  • Open the JOB CHAINS tab and enable Show orders.
  • Find the job chain samples/shell_error/simple_error_chain.
  • Find the order simple_error_order, open the order menu and choose Start order now.

...

The error can also be blamed on the job, which will be described in the next section.

Behavior with stop_on_error="yes"

  • Edit the job configuration file simple_chained_job2.job.xml
  • If you have changed the exit code (which caused the error) to exit 0 change it back to exit 5 to simulate an error again
  • Change stop_on_error="no" to stop_on_error="yes" and save
  • Run the order again
  • Look at the order history

...

This example has used the stop_on_error="yes" to blame the error on the job.

Suspending Orders

Another option in the event of an error is to suspend the order:

  • First of all, ensure that stop_on_error is set for both jobs to "no"
  • Then edit the job chain configuration file simple_error_chain.job_chain.xml:
    • On the next job_chain_node add a new on_error="suspend" attribute and save
  • Run the order again
  • When the error now occurs, the order will be put back into the order queue of the second job but it will be suspended.
    This means that the order will not run again, until somebody manually chooses  "resume" from the order menu.
  • Fix the job - i.e. change exit 5 to exit 0
  • Choose "resume" from JOC's order menu

Retry using "setback"

Alternative Example:

Note that we also have a dedicated example, showing the use of setbacks: How to use setbacks to make a job retry in the event an error

...

If the job is fixed during the retries, the order will go to the next_state.

How it works

The main "switch" for controlling error handling of shell jobs is the stop_on_error attribute of a job. If stop_on_error is set to yes, the job is blamed for the error and is stopped. If stop_on_error is set to no, the order is blamed for the error. For more information on stop_on_error see http://www.sos-berlin.com/doc/en/scheduler.doc/xml/job.xml#attribute_stop_on_error

...