KBI 311155 Issue Addressed: Pre-requisite SLA Rule Not Working When Running Down Rule Internally

Version

Argent Advanced Technology 3.1A-1410-A or earlier

Date

Friday, 16 Jan 2015

Summary

The pre-requisite SLA Rule in a Relator is used so that down-stream servers in a Monitoring Group are monitored only if the upstream switch/router is accessible

This is done so the Argent Console is not flooded with spurious Alerts about the servers being down when, in fact, it is the switch/router in front of the group of servers that is actually down

There is an advanced feature used when running System Down Rule internally in Argent AT

When turned on it, may reduce some loads on a heavily used system: Argent AT uses Work Order files to exchange information between Supervising Engine and Monitoring Engine processes

The associated CPU and I/O can sometimes add stress to the system, thus reducing throughput; the typical symptoms include high CPU, skipped Relator executions, and delayed Alerts

With an Argent Engineer’s help, this feature can be turned on to possibly increase system performance

DO NOT TURN ON THIS FEATURE WITHOUT FIRST *SPEAKING* TO AN ARGENT ENGINEER

This feature is considered as an advanced feature

The majority of customers, with adequate hardware, do not need it

However, there is an issue in versions prior to 1501-A with this feature of running System Down Rule internally — the pre-requisite SLA Rule is ignored when the feature is turned on

As a result, the down-stream servers are always monitored and large number of spurious Alerts are wrongly generated about the servers being down when, in fact, it is the upstream switch/router is that is actually down

The issue is addressed in Argent AT 3.1A-1501-A

Technical Background

Customer may turn on the feature running System Down Rule internally by setting registry ‘RUN_DOWN_RULE_THREAD_LIMIT’ to non-zero value

It is usually set to 128

Customer can confirm it by checking log line ‘System Down Rule Execution’ in Service Log

If the following value is ‘Normal’, the feature is OFF

If the value is ‘Internal (Thread Limit: xxx)’, the feature is ON

When the feature is ON, instead of using Work Order and Monitoring Engine process, Argent AT Supervising Engine runs System Down Rule and SLA Rule within Supervising Engine process

Customer usually turns this ON in order to deal with performance issue in heavily used system

Note: Customer is not recommended to turn ON this advanced feature without consulting Argent first

Resolution

Upgrade to Argent AT 3.1A-1501-A or later

For customer who cannot upgrade immediately, he should reset registry ‘RUN_DOWN_RULE_THREAD_LIMIT’ to zero