KBI 311151 How To Ensure High Availability In Argent Job Scheduler

Version

Argent Job Scheduler all versions

Date

Wednesday, 14 Jan 2015

Summary

The article describes the following:

The Perfect Installation method and the Best Practices to ensure High Availability in Argent Job Scheduler
How Argent Job Scheduler High Availability architecture handles the following unlikely Events of server downtimes

Main Engine goes down
Main Engine comes back online when Backup Engine runs on Main Database
Main Database goes down
Switching back to Main Engine manually after Main Database is up
One of the Queue Engine goes down

Technical Background

High Availability of Argent Job Scheduler can be ensured by strictly following the Argent Job Scheduler High Availability architecture during the installation and configuration

The explanations are for one Scheduling Engine – many Queue Engine environment

In case of multiple installations of Main Scheduling Engines, the architecture prescribed for a single Main Scheduling Engine is applicable for all other Main Scheduling Engines

Perfect Installation

A perfect installation for Argent Job Scheduler for High Availability should be as follows:

Main Scheduling Engine in one Server
Main Database (SQL Server) in another server
Backup Scheduling Engine in third server installed using the same database of Main Scheduling Engine
Backup Database in fourth server
Two Queue Engines (second one replica of the first) to run the same Job of which the first available server is selected for execution

Click For Full Size

Best Practises For Setting Up High Availability

Backup Scheduling Engine Setting

The Backup Scheduling Engine should be installed on a server in the same network and with same user credentials as the Main Scheduling Engine

This is to ensure that the Backup Scheduling Engine has availability of dependent databases on different servers and dependent files specified using UNC paths

Backup Database Settings

The Backup Database should be installed on another server than the Main Database
No mirroring tools should be configured for the Backup Database as Argent Job Scheduler itself syncs databases automatically

Replica Queue Engine Settings

The Queue Engine replica should be in the same network and should have the Argent Queue Engine service running under the same user credential as the original Queue Engine
The Job files specified locally in the original Queue Engine should be available locally for replica Queue Engine as well
If the Job files are specified using UNC paths in original Queue Engine, the path should be accessible from the replica Queue Engine as well
In J20C screen of Argent Job Scheduler, both the Queue Engine and its replica should be added and the first radio option ‘First Available Server’ should be selected as shown in the screenshot below

Click For Full Size

How Argent Job Scheduler High Availability Architecture Handles Server Down Times

The below sections explain the behaviour of Argent Job Scheduler under the following scenarios:

Main Engine goes down
Main Engine comes back online when Backup Engine runs on Main Database
Main Database goes down
Switching back to Main Engine manually after Main Database is up
One of the Queue Engine goes down

Scenario 1: Main Engine Down

Consider the situation where the Main Scheduling Engine goes down

Click For Full Size

Backup Engine Takes Over

The Backup Scheduling Engine detects that Main Scheduling Engine is down in specified time and checks for the availability of Main Database
Once the Backup Scheduling Engine finds the Main Database is up, it takes over scheduling from Main Scheduling Engine with the Main Database

Click For Full Size

Scenario 2: Main Engine Back Online While Backup Engine Runs On Main Database

The Backup Scheduling Engine detects that Main Scheduling Engine is up, in the specified time; it transfers service back to Main Engine
The Backup Engine switches back to Backup mode

Click For Full Size

Scenario 3: Main Database Down

Consider the situation where the Main Database goes down

Click For Full Size

Main Engine Shuts Down And Backup Engine Starts Using Backup Database

Main Scheduling Engine detects the Main Database is down and waits for it to come back online for 15 minutes
Once it fails to detect the Main Database up even after 15 minutes, Main Scheduling Engine service goes down automatically
The Backup Scheduling Engine detects the Main Scheduling Engine as down and checks for the availability of Main Database before taking over
When the backup Scheduling Engine finds the Main Database is down, it takes over the scheduling with the Backup Database

Click For Full Size

Scenario 4: Switching Back To Main Engine After Main Database Is Up

Note:

Switching back to Main Scheduling Engine is possible only through user intervention

Down the Backup Engine service
Bring up the Main Database
Run the AJS_FP/REVERSE executable in command prompt that synchronises the Backup Database with Main Database
Bring up the Main Scheduling Engine service
Bring up the Backup Scheduling Engine service which works in backup mode

Scenario 5: Queue Engine Is Down

Consider the scenario where one of the Queue Engine goes down

Click For Full Size

Job Is Executed On The First Available Queue Engine

As the the Job is configured to execute on the first available Queue Engine out of the 2 available, the Job gets executed on the second Queue Engine when the first one goes down

Resolution

N/A