KBI 311151 How To Ensure High Availability In Argent Job Scheduler

Version

Argent Job Scheduler all versions

Date

Wednesday, 14 Jan 2015

Summary

The article describes the following:

  1. The Perfect Installation method and the Best Practices to ensure High Availability in Argent Job Scheduler
  2. How Argent Job Scheduler High Availability architecture handles the following unlikely Events of server downtimes
    1. Main Engine goes down
    2. Main Engine comes back online when Backup Engine runs on Main Database
    3. Main Database goes down
    4. Switching back to Main Engine manually after Main Database is up
    5. One of the Queue Engine goes down

Technical Background

High Availability of Argent Job Scheduler can be ensured by strictly following the Argent Job Scheduler High Availability architecture during the installation and configuration

The explanations are for one Scheduling Engine – many Queue Engine environment

In case of multiple installations of Main Scheduling Engines, the architecture prescribed for a single Main Scheduling Engine is applicable for all other Main Scheduling Engines

Perfect Installation

A perfect installation for Argent Job Scheduler for High Availability should be as follows:

  1. Main Scheduling Engine in one Server
  2. Main Database (SQL Server) in another server
  3. Backup Scheduling Engine in third server installed using the same database of Main Scheduling Engine
  4. Backup Database in fourth server
  5. Two Queue Engines (second one replica of the first) to run the same Job of which the first available server is selected for execution

Best Practises For Setting Up High Availability

Backup Scheduling Engine Setting

  • The Backup Scheduling Engine should be installed on a server in the same network and with same user credentials as the Main Scheduling Engine

    This is to ensure that the Backup Scheduling Engine has availability of dependent databases on different servers and dependent files specified using UNC paths

Backup Database Settings

  • The Backup Database should be installed on another server than the Main Database
  • No mirroring tools should be configured for the Backup Database as Argent Job Scheduler itself syncs databases automatically

Replica Queue Engine Settings

  • The Queue Engine replica should be in the same network and should have the Argent Queue Engine service running under the same user credential as the original Queue Engine
  • The Job files specified locally in the original Queue Engine should be available locally for replica Queue Engine as well
  • If the Job files are specified using UNC paths in original Queue Engine, the path should be accessible from the replica Queue Engine as well
  • In J20C screen of Argent Job Scheduler, both the Queue Engine and its replica should be added and the first radio option ‘First Available Server’ should be selected as shown in the screenshot below

How Argent Job Scheduler High Availability Architecture Handles Server Down Times

The below sections explain the behaviour of Argent Job Scheduler under the following scenarios:

  1. Main Engine goes down
  2. Main Engine comes back online when Backup Engine runs on Main Database
  3. Main Database goes down
  4. Switching back to Main Engine manually after Main Database is up
  5. One of the Queue Engine goes down

Scenario 1: Main Engine Down

Consider the situation where the Main Scheduling Engine goes down

Backup Engine Takes Over

  1. The Backup Scheduling Engine detects that Main Scheduling Engine is down in specified time and checks for the availability of Main Database
  2. Once the Backup Scheduling Engine finds the Main Database is up, it takes over scheduling from Main Scheduling Engine with the Main Database

Scenario 2: Main Engine Back Online While Backup Engine Runs On Main Database

  1. The Backup Scheduling Engine detects that Main Scheduling Engine is up, in the specified time; it transfers service back to Main Engine
  2. The Backup Engine switches back to Backup mode

Scenario 3: Main Database Down

Consider the situation where the Main Database goes down

Main Engine Shuts Down And Backup Engine Starts Using Backup Database

  1. Main Scheduling Engine detects the Main Database is down and waits for it to come back online for 15 minutes
  2. Once it fails to detect the Main Database up even after 15 minutes, Main Scheduling Engine service goes down automatically
  3. The Backup Scheduling Engine detects the Main Scheduling Engine as down and checks for the availability of Main Database before taking over
  4. When the backup Scheduling Engine finds the Main Database is down, it takes over the scheduling with the Backup Database

Scenario 4: Switching Back To Main Engine After Main Database Is Up

Note:

Switching back to Main Scheduling Engine is possible only through user intervention

  1. Down the Backup Engine service
  2. Bring up the Main Database
  3. Run the AJS_FP/REVERSE executable in command prompt that synchronises the Backup Database with Main Database
  4. Bring up the Main Scheduling Engine service
  5. Bring up the Backup Scheduling Engine service which works in backup mode

Scenario 5: Queue Engine Is Down

Consider the scenario where one of the Queue Engine goes down

Job Is Executed On The First Available Queue Engine

As the the Job is configured to execute on the first available Queue Engine out of the 2 available, the Job gets executed on the second Queue Engine when the first one goes down

Resolution

N/A