Introduction

Relevant information is also available in the JS7 - Impact of a Controller outage article

Table of Contents

Introduction

Consider the wording: a Controller can be operated as a Standalone Controller instance or as a Controller Cluster. The term Controller instance refers to the standalone instance or to a Controller Cluster member instance.
The outage of a Controller instance will not stop the execution of workflows with Agents.
However, if a workflow includes jobs that are executed with different Agents then the workflow will not be completed and will be put on hold as switching of Agents during workflow execution is performed by the Controller.
Testing by SOS includes performing tests for the scenario when a Controller is not available for 24 hours and the Agent executes any scheduled orders. Once the Controller instance is started again then job execution results are updated to the JOC Cockpit History and become visible with the GUI.
For information about the behaviour behavior in case of outages see see the JS7 - FAQ - What happens to workflows in case of outage of a Controller?

Controller Cluster

If you operate a Controller Cluster then an automated fail-over takes place should the active Controller instance fail. A fail-over typically occurs within 3-5s. Should the standby Controller instance fail then this does not affect the active Controller instance? Running a JS7 high availability cluster gives you the relaxed option not to have to take immediate action if one of the instances fails. However, should you intend to immediately make available the failed Controller Cluster member instance then the steps explained below similarly apply to the failed instance.

The below troubleshooting hints are intended for users operating a Standalone Controller, the steps explained are not required for users operating a Controller Cluster.

Troubleshooting

Use of a Controller cluster will leverage an outage situation as the standby Controller instance will pick up operations immediately during fail-over after 3-5s. The failed active Controller instance can be started later on and will automatically synchronize with the currently active Controller instance.

Troubleshooting

The JS7 Controller holds The Controller is the component in JS7 that holds the job-related configurations and orchestrates the AgentAgents. The outage of a Controller instance does not prevent the execution of workflows having the with jobs running on a single the same Agent. However, it affects, for example the execution of workflows that include jobs that are running on a number of Agents as switching of Agents during workflow execution is performed by the Controller.

Troubleshooting starts from the fact that users reproduce and locate a problem in order to better know the nature of the problem:

As a first step check the Controller's log file controller.log and watchdog.log, see JS7 - Log Files and Locations.
- Warnings and errors can be found in log files with the output qualifiers WARN and ERROR.
- Example:
  - 2021-10-10T09:53:04,939 WARN js7.base.session.SessionApi - HttpControllerApi(https://apmacwin:4344): HTTP 401 Unauthorized: POST https://apmacwin:4344/controller/api/session => InvalidLogin: Login: unknown user or invalid password
Due to log rotation, log files from previous days are available in a compressed .gz format on a daily basis, see the JS7 - Log Rotation article for details.
- For Unix the zcat command can be used to directly access compressed log files.
- For Windows compressed files have to be extracted, for example using 7-zip.
Note that a Controller instance can report problems related to other products such as Agents and the JOC Cockpit. In this situation it is recommended that the product's log files are checked.
If warnings or error messages are not evident then users should do some research: the Product Knowledge Base and the Change Management System offer a search box, browsers offer access to search engines.
Having completed analysis of a problem and being certain that the problem is related to a product defect and not to resources of the IT environment:
- customers with a commercial license should use the Support Resources including the SOS ticket system.
- users with the open source license are invited to use Community Resources.
Should the controller.log file not provide sufficient information to reproduce a problem then the log level should be increased, see the JS7 - Log Levels and Debug Options article.

In some situations, for example if computer memory is not sufficient for the heap size of the Controller instance's Java Virtual Machine, the outage of a Controller instance can be handled by restarting the instance. However, problems indicating insufficient resources typically require better sizing of resources.

If the problem is related to server resources and if operation of the Controller cannot be continued on the same server then relocation of the Controller instance can be a last means to fight an outage. Relocation includes copying/moving the Controller instance's JS7_CONTROLLER_DATA/state directory to a Controller instance on a new server. This directory holds the Controller instance's journal. To relocate a Controller instance the journal files should be copied to the new Controller instance. Refer to the JS7 - How to relocate a Controller article for the steps to applyA Controller instance outage can be handled either by resolving the issue with the current Controller instance, e.g. by restarting or by relocating the ./state directory to a new Controller instance. The journal of a Controller instance is stored in the ./state directory. To relocate a Controller instance to a new Controller instance location copy the journal files to the target Controller instance. Refer to the article Relocating Controller instance Journal for the steps on how the Controller can be relocated.

Space shortcuts

Page tree

Versions Compared

Old Version 4

New Version Current

Key

Introduction

Introduction

Controller Cluster

Troubleshooting

Troubleshooting

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 4

New Version Current

Key

Introduction

Introduction

Controller Cluster

Troubleshooting

Troubleshooting