System Down And SLA Rules

System Down Rules and SLA Rules combine to build an extremely powerful set of tests to determine if a server or device is up or offline.

First: what is the right way to check a machine or device with Argent?

The answer is: Yes.

In other words, it’s courses for horses – in your environment different servers and devices need to be checked in different ways.

Typically the type of testing is based on the criticality of the server – more critical servers need more comprehensive and more frequent testing.

Argent provides the tools needed for complete SLA analysis, regardless of the depth.

Pings, Opening Files And NetRemoteTOD

All pings only go to the low-level physical layer of TCP/IP – there may be a blue screen on the W200x server, but all pings return a valid status.

To correctly check the actual operating system is running correctly, a ping should not be used, rather the more powerful options checking the health of the operating system should be used.

So what can be used?

The best – and the most expensive – way to check the operating system is to test the operating system.

Sounds simple.

Here’s the best check you can do: check the existence of a file.

The beauty of this approach is it exercises many critical components of the operating system – the File System, the Disk System, and the Kernel.

But because this Argent option does test the operating system so well, it’s also the most expensive in terms of resources.

Another very good test is the Windows NetRemoteTOD.

This Windows API uses a lot of resources and needs to touch a number of different parts of the operating system.

Not as comprehensive as the file check, but faster.

Check Cluster

You can use Microsoft Cluster API to determine status (online or offline) of a cluster object. Of course this test only works for Windows Clusters.

Scan Specific TCP Port

This is a universal test.

By default, it is a simple port scanning, connecting a TCP port then disconnecting.

Using the Advanced feature you can specify actual text based handshaking routines.

It is good for testing protocols like SMTP, POP3 etc.

Secure Shell Logon/Logoff

PLINK is used to logon into the SSH server, execute the command ‘exit’, then terminate.

It tests the availability of the SSH server as well as the correctness of user/password pair.

How Do I Check A Server Or Device Every Second?

This is harder than it seems, because without careful planning the monitoring product sucks up huge amounts of resources, especially bandwidth.

Argent’s solution is to provide a very lightweight ping. It only checks if the server or device is online, but you can use it in conjunction with more powerful tests – you can use the Advanced Features tab on the Relator to run a second Relator if the lightweight ping Rule fails.