Best Practices

Monitoring Argent

AT Performance And Best Practices

Argent Enterprise bundles a customizable CHECK_AT.XML to monitor Argent AT performance and best practices.

Argent Enterprise automatically determines servers with Argent AT components installed. The Argent AT components include the Main Engine, Daughter Engine, as well as Trusted Agents for ALL production Argent AT products.

Besides monitoring the baseline such as the status of Argent AT services and restarting stopped ones, Argent Enterprise also does the following:

  1. Checks Windows performance metrics to identify over-taxed systems
  2. Checks vital system performance settings to conform to best practices

The behavior is controlled by CHECK_AT.XML at the Control Center. The XML file is propagated to all remote Domain Observers.

A sample is shown as follows:

 

The section <PERFDATA> controls the performance monitoring:

  • Each tag <PERF> defines one performance counter.
  • Attributecounteris the performance counter path.
  • Attributemindefines the minimum metric value. Being lower than this value is deemed as a bad condition.
  • Attributemaxdefines the maximum metric value. Being higher than this value is deemed as a bad condition.
  • Optional attributedurationdefines that the condition must persist for at least that many seconds to be deemed as bad.
  • Optional attributesummaryis the brief description of the condition. It will be used in Network Real-time Status and Argent Console events.

The section <REGDATA> controls the registry checking for best practices.

  • Tag <PRODUCT> groups registry checks by product.
  • Attributenameis the matching product name such asArgent Guardian Ultra. Wildcards are supported. For example,*means all Argent AT products.
  • Attributeinstallation_typedetermines the matching installation. It should match the registry entryINSTALL_TYPEfor the specific product. For example, 0 for Main Engine, 3 for Trusted Agent, 6 for Daughter Engine.
  • Each tag <REG> defines one registry key checking.
  • Attributepathis the registry key path.
  • Attributemindefines the minimum metric value. Being lower than this value is deemed as a bad condition.
  • Attributemaxdefines the maximum metric value. Being higher than this value is deemed as a bad condition.
  • Optional attributesummaryis the brief description of the condition. It will be used in the Network Real-time Status and Argent Console events.

Custom Logic

If users have a special need for additional monitoring, he can create a WMI script, PowerShell script or Linux/Unix shell script to accomplish the task.

 

Script Session TimeoutThe maximum time allowed for running the script.

Apply Custom Logic To Following Node Types With RolesThis qualifies the node type and role. The roles include:

  • Any
  • Center
  • Domain Observer
  • Segment Inspector
  • Argent Transport Backbone
  • Argent Products
  • Managed Switch  
  • Windows Machines   
  • UNIX Servers    
  • Generic IP Devices

Matching ApplicationsIt further qualifies the matching nodes by checking the Installed Applications defined within CMDB-X.

Exclude NodesUsers can optionally further exclude additional nodes from checking.

If some nodes require logon credentials other than the service account, user can specify them inOther Credentials’. This is especially true for Linux/Unix servers.

The sample output in device status (See Network Real-Time Status) looks like the following:

Sample WMI Logic

'

' Copyright (c) 2012 ArgSoft Pacific Intellectual Property Holdings (HK), Limited

'




' Method 'WriteStatus' logs useful output




' Property 'AGNodeName' is the Windows machine being checked




' Property 'AGExitCode' is the check result.

'

'�� 0 - Success

'�� 1 - OK

'�� 2 - Internal Error

'�� 3 - Warning

'�� 4 - Serious

'�� 5 - Critical

'�� 6 - AT service down

'�� 7 - Down because of others, such as nodes behind a off-line switch

'�� 8 - Off-line




WriteStatus "Sample custom WMI script. Checking " & AGNodeName




AGExitCode = 1

Sample PowerShell Logic

#

# Copyright (c) 2012 ArgSoft Pacific Intellectual Property Holdings (HK), Limited

#




# Method '$PSPlayer.WriteStatus' logs useful output




# Property '$PSPlayer.AGNodeName' is the Windows machine being checked




# Property '$PSPlayer.AGExitCode' is the check result.

#

#�� 0 - Success

#�� 1 - OK

#�� 2 - Internal Error

#�� 3 - Warning

#�� 4 - Serious

#�� 5 - Critical

#�� 6 - AT service down

#�� 7 - Down because of others, such as nodes behind a off-line switch

#�� 8 - Off-line




$message = [System.String]::Format("Sample custom PowerShell script. Checking {0}", $PSPlayer.AGNodeName)




$PSPlayer.WriteStatus($message)




$PSPlayer.AGExitCode = 1

Sample UNIX Shell Logic

#!/bin/sh

#

# Copyright (c) 2012 ArgSoft Pacific Intellectual Property Holdings (HK), Limited

#

#

# Use exit code to specify the check result.

#

#�� 0 - Success

#�� 1 - OK

#�� 2 - Internal Error

#�� 3 - Warning

#�� 4 - Serious

#�� 5 - Critical

#�� 6 - AT service down

#�� 7 - Down because of others, such as nodes behind a off-line switch

#�� 8 - Off-line




echo "Sample custom UNIX shell script"




exit 1

Abnormal Condition And Alerts

Argent Enterprise can also alert on issues found, and automatically resolve the event when the condition is corrected. The alerts used are defined in the Domain Configuration.

Because the node that event is about may not be a licensed node, in order to see the event, users may have to select the optionShow All Nodes In CMDB-X Database.

Network Real-Time Status

Network Real-Time Status is the console for Argent Enterprise. Users can see the status of the whole Enterprise Network here.

The upper part displays the real-time status of each node in the topology database including Main Network and remote Network Domains.

Device Status column shows red for offline devices, orange for issues other than offline devices, and green for OK status.

Process Time is the total time spent running the monitoring logic.

The lower part displays the history of status changes. A new history row is added only when device status changes. In other words, one row is for one status change, NOT for one execution of monitoring logic. When there is no status change, the detail of execution is added to the current record.

Root Cause column indicates where the issue is caused by off-line switch that node depends on.

Summary column is the brief description of the issue.

To view the detail of a status row, double click or use the context menuView Status Detail by Notepad.

To view the trend of process time, users can also select tabEvent Correlation.  When the process time is way out of normal, it generally indicates degraded network connectivity.