KBI 310260 Outages for Argent Job Scheduler Primary Engines

Version

Argent Queue Engine 7.1A-1004-B and above

Date

10 Oct 2011

Summary

Argent Job Scheduler customers sometimes experience unwanted takeovers by an Argent Job Scheduler secondary Scheduling Engine during extended outages on the primary Scheduling Engine.

This article discusses how customers can manage planned outages of a primary Argent Job Scheduler Scheduling Engine to avoid unwanted takeovers by the secondary Scheduling Engine.

Technical Background

Recap of Argent Job Scheduler ‘Fail-Over’ Technology

In Argent Job Scheduler, an important feature allows customers to define two Scheduling Engines in a primary/secondary configuration that enables ‘fail-over‘ from the primary Scheduling Engine to the secondary Scheduling Engine.

Using this feature, customers can achieve non-stop operations of Argent Job Scheduler when the primary Scheduling Engine platform fails or halts unexpectedly.

Argent’s fail-over technology uses the date/time stamps of internal ‘heartbeat‘ disk files as a way for the primary and secondary Scheduling Engines to communicate. The primary Scheduling Engine updates the date/time stamp of the internal heartbeat file frequently, typically every thirty (30) seconds.

When the date/time stamp of the internal heartbeat file has not changed for a configurable length of time, the secondary Scheduling Engine knows that the primary Scheduling Engine is no longer active and it must take over.

System Maintenance on Windows

Customers often want to plan outages on Windows servers to perform routine system maintenance and apply Microsoft service packs, patches, fixes. Scheduled, planned outages are commonly taken at weekly, monthly, or quarterly intervals.

In many cases, Microsoft requires multiple re-boots on machines where updates have been applied. For service packs, the length of time needed to install the software can be quite lengthy.

Argent recommends that customers that have deployed Argent Job Scheduler in a primary/secondary Scheduling Engine configuration apply service packs, patches, and fixes to both the primary and secondary Scheduling Engine servers during the same outage window.

Without proper planning, customers can experience unwanted or unexpected takeovers by the secondary Scheduling Engine during these extended outages.

The following section outlines the steps needed to avoid unwanted re-starts of either the primary or secondary Argent Scheduling Engine during planned outages.

Steps to Avoid Unwanted Re-Starts and Takeovers

  1. Begin outage window

  2. End Scheduling Engine service on secondary Scheduling Engine server

    1. Prevents secondary Scheduling Engine from taking over

  3. Rename C:\ARGENT\SchedulingEngine\Logs\SVC_LOG.TXT on the secondary Scheduling Engine server

    1. This makes sure the Argent Job Scheduler service log file does not span the outage window

    2. Helps define a clear ‘before-and-after‘ in the log file

  4. Use Windows Service Control Manager to – temporarily – change the secondary Argent Job Scheduler Scheduling Engine service Startup Type from ‘Automatic‘ to ‘Manual

    1. This prevents the secondary Scheduling Engine service from starting when Microsoft requires multiple re-boots during the outage window

  5. End Scheduling Engine service on primary Scheduling Engine server

  6. Rename C:\ARGENT\SchedulingEngine\Logs\SVC_LOG.TXT on the primary Scheduling Engine server

    1. This makes sure the Argent Job Scheduler service log file does not span the outage window

    2. Helps define a clear ‘before-and-after‘ in the log file

  7. Use Windows Service Control Manager to – temporarily – change the primary Argent Job Scheduler Scheduling Engine service Startup Type from ‘Automatic‘ to ‘Manual

    1. This prevents the primary Scheduling Engine service from starting when Microsoft requires multiple re-boots during the outage window

  8. Apply service packs, patches, and fixes on primary Scheduling Engine server

    1. Multiple re-boots may be required by Microsoft

    2. Since the Scheduling Engine service on the secondary server is not active, it can not take over (by definition)

    3. Since the Scheduling Engine service on the primary server is set to ‘Manual‘ scheduling and job dispatching does not take place since the service does not start

  9. Apply service packs, patches, and fixes on secondary Scheduling Engine server

    1. Multiple re-boots may be required by Microsoft

  10. Use Windows Service Control Manager to change the Argent Job Scheduler Scheduling Engine service Startup Type on the primary Argent Job Scheduler server from ‘Manual‘ back to ‘Automatic

  11. Use Windows Service Control Manager to change the Argent Job Scheduler Scheduling Engine service Startup Type on the secondary Argent Job Scheduler server from ‘Manual‘ back to ‘Automatic

  12. Start the Argent Job Scheduler Scheduling Engine service on the primary Scheduling Engine server

  13. Start the Argent Job Scheduler Scheduling Engine service on the secondary Scheduling Engine server

    1. Since the Scheduling Engine service on the primary server is back up, the secondary does not take over because the internal heartbeat file is current

  14. Review status of primary and secondary Scheduling Engine services

  15. End outage window

Using these steps can help avoid unwanted or unexpected restarts of the Argent Job Scheduler Scheduling Engine service and unwanted or unexpected takeovers by the secondary Argent Job Scheduler Scheduling Engine during extended, planned outage windows.

Resolution

N/A