KBI 311132 Use High Frequency SLA Rule

Version

Argent Advanced Technology all versions

Date

Friday, 19 Dec 2014

Summary

Argent AT SLA Rule checks server/device accessibility by simple ICMP Ping

It is the same as the TCP/IP Ping in System Down Rule if default option is used

Argent AT Engine pings the target IP with the default timeout and retry count specified in Licensed Node Properties

Click For Full Size

The option ‘Run Lightweight SLA Checking With Interval Of xxxx Seconds’ is used to implement High Frequency SLA Checking

Normally Rules are executed in the interval specified in Relator

Argent AT Relator is typically scheduled as running once every one or a few minutes

Customer may want to find out down nodes sooner than minutes

High Frequency SLA Checking can be used

The Rule is submitted to Monitoring Engine just like other Rules

However, instead of returning immediately after single execution, the High Frequency SLA Rule will be executed multiple times with the interval specified in the Rule

It returns if the target node is found down, or Relator interval is used up

As a result, in the eye of Supervising Engine, the Relator is executed with interval specified in Relator; in the eye of end user, SLA Rule is executed consecutively with interval specified in the Rule, which is generally a few to 30 seconds, much shorter than the Relator interval
The option ‘If TCP/IP Ping Failed, Wait and Check Again’ can be used to more adaptive recheck

If using System Down Rule, customer usually uses the option ‘Post Event Only After Rule Is Broken xxx Times’ to achieve similar

The advantages of the option in SLA Rule include:
- Lighter resource usage
- Faster response within seconds instead of full Relator interval for the double check
- Adaptive timeout To get faster response, the default time out is usually shorter than 10 seconds
- When doing the double check, customer may want to use a longer timeout to be sure
- The longer timeout can be specified in the SLA Rule

Note: The double check is only done once for the single execution of SLA Rule

It can be combined with the option ‘Post Event Only After Rule Is Broken xxx Times’ to do additional double checks

Technical Background

The main reason to use SLA Rule instead of System Down Rule is the advantage of being lightweight with High Frequency Checking

Argent AT Engine runs Rules using Work Order (WO) files

If System Down Rule is scheduled to run once every 10 seconds against 1,000 server/devices, the I/O burden can easily exceeds the system limit

Using Lightweight High Frequency SLA Rule can handle this elegantly

On the other hand, SLA Rule only does ICMP Ping

When customer needs to check up/down status of Windows Network, WMI, Remote Desktop service etc, ‘System Down Rule’ should be used

To summarize:

If doing simple one time ping for up/down status, SLA Rule and System Down Rule are equivalent
If doing ping checks multiple ping checks per minute, use SLA Rule with ‘High Frequency SLA Checking’ option turned ON
If doing system up/down check other than simple ICMP ping, use ‘System Down Rule’

Resolution

N/A