KBI 310893 Issue Addressed: Group Of Relators Cancel When Monitoring Engine Is Overloaded
Version
Argent Advanced Technology 3.1A-1401-E or below
Date
Tuesday, 25 Mar 2014
Summary
A heavily saturated Monitoring Engine Process Pool may cause a group of Relators to cancel
Technical Background
This issue may occur on a heavily utilized Monitoring Engine processing numerous tasks during the Monitoring Engine Recycle Process
By default, every 30 minutes the Monitoring Engine Processes are recycled
If a bunch of Monitoring Engine processes happen to recycle at almost the same time, numerous tasks can be reassigned to a few available Monitoring Engine processes in the pool
As a result, the few running Monitoring Engine processes can be overloaded
After being stuck in the queue for more than 5 minutes, the pending tasks are cancelled by the Supervising Engine
The Supervising Engine log files will have multiple repeated lines such as:
“About to cancel task (Relator: REL_AIX_DISK_SPACE, server: ARGENTSRV)
Reason: it has been pending for more than 5 minutes
”An occasional occurrence of these cancelled tasks is likely due to temporary network congestion or unusually high load on the Monitoring Engine server and does not indicate an issue
However, 20 or more these cascaded lines of cancelled tasks in the log file warrants the following resolution
The issue is addressed in Argent AT 3.1A-1401-T4
Resolution
1: Upgrade to Argent AT 3.1A-1401-T4
OR
2: For customers unable to upgrade immediately, use the following workaround:
Disable the Monitoring Engine Recycle process:
To disable, add the following new D-Word Key to the Windows Registry:
RULE_ENGINE_MAX_RUN_SECONDS in ALL affected Argent AT products
Set the Decimal value to 0 on EACH Monitoring Engine