KBI 310230 False Positives From TCP/IP Ping Rules

Version

Argent Extended Technology — All versions

Date

20 Oct 2010

Summary

Customers may report false positives from the TCP/IP Ping Rule option in the System Down and SLA Rules.

Technical Background

The TCP/IP Ping Rule has special characteristics compared with other Rules.

By default, each execution of the TCP/IP Ping Rule executes a single ICMP ping request (standard 32-byte packet).

A ping is very lightweight, and busy or unreliable networks may cause intermittent packet loss, which results in false positives.

Resolution

The solution for almost all false positives related to TCP/IP ping is to increase the Attempt Count for servers from the default of 1 to say, 5.

This causes the TCP/IP Ping Rule to do 5 successive pings (with an interval of 1 second in between), instead of just 1.

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

Argent will mark the Rule as “PASS” once a single successful ping is encountered.

Similarly, Argent willl mark the Rule as “BROKEN” if ALL attempted pings fail.

There is actually some intelligence built in to not “waste” pings. The Attempt Count can be thought of as the maximum attempt count.

Here are some examples to help illustrate:

PASS after 2 attempts (Attempt Count: 5)

Request Timed Out

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

PASS after 5 attempts (Attempt Count: 5)

Request Timed Out

Request Timed Out

Request Timed Out

Request Timed Out

Reply from SVRARGENT01: bytes=32 time<1ms TTL=128

BROKEN after 5 attempts (Attempt Count: 5)

Request Timed Out

Request Timed Out

Request Timed Out

Request Timed Out

Request Timed Out

To change the Attempt Count, this is done in the License Manager, and can be done by selecting multiple servers (so you don’t have to do each server one-by-one), and selecting TCP/IP & SNMP.

Goto the TCP/IP & SNMP tab, uncheck “Use Default Configuration Option“, and change the Attempt Count here.

There are also other options available in the actual Rule itself:

Run Lightweight SLA Checking With Interval Of X Seconds

This option is used for customers who want a more granular check — instead of following the Relator’s typical execution frequency, the ping Rule can be set to continously execute every X seconds.

Do note that each execution of the ping includes the Attempt Count previously mentioned.

This means, if we set this option to 30 seconds, and if the Attempt Count was 5, Argent would execute 5 consecutive pings every second, then repeat this every 30 seconds.

If TCP/IP Ping Failed, Wait X Seconds And Check Again With Timeout Y Seconds

This option is used to deter false positives caused by timeout settings that may be set too low (the timeout settings are set in the same place where we set the Attempt Count — the License Manager server properties).

If this option is used, if the ping Rule fails, it does not immediately mark it as BROKEN. It does one extra ping, after X seconds, with a new timeout of Y seconds.

So if the default timeout for the server was 10 seconds, we can try another ping with a new timeout of 30 seconds.

If this extra ping test also fails, then the Rule is truly considered broken.