KBI 311124 Relator Not Running And/Or Pending Alert Has Thousands Of Files

Version

Argent AT All versions

Date

Wednesday, 3 Dec 2014

Summary

Argent Console service stops after a SQL Server outage after more than allowed timeout period, by default 3 minutes, same for other Argent products, but say if Argent Guardian Ultra has a SQL Server timeout of 60 minutes and the service remains running it may appear everything is fine

Subsequently the PENDING_EVENT directory may fill (if other Argent services are still running) due to not being able to process Events, and then causing the Argent Guardian Ultra service to stall upon a service restart or maintenance recycle

Technical Background

Upon a SQL Server outage the Argent Console service (and potentially other Argent services) stops after the SQL Server timeout threshold set in the Windows Registry

If the Argent Console service stops and any other Argent services continue,

the <DRIVE>:\ARGENT\{PRODUCT}\PENDING_EVENTS directory may fill

The PENDING_EVENTS directory is used to temporarily store as a file for Events (Alerting via the Argent Console) and Argent Predictor Metrics

These files process first-in-first-out (FIFO), and therefore if the Argent Console service is stopped the other Argent services cannot process beyond that point the directory appears to fill with files

Once this PENDING_EVENTS directory gets to a critical mass (depending on system performance) taking longer than 5 minutes to process ‘Failed to see check point Event’, an internal recycle of the process is triggered

Causing the Argent service to never correctly start and Scheduled Tasks do not schedule

Resolution

  1. Ensure ALL Argent services are up and running after a SQL Server outage
  2. Under Registry key HKLM\SOFTWARE\WOW6432NODE\ARGENT\ARGENT_CONSOLE,
    change the SQL_OUTAGE_ALLOWED_IN_MINUTE value to a larger value, e.g. 60 and

    ensure the SQL_OUTAGE_ALLOWED_IN_MINUTE value across all Argent products are consistent

  3. Under Argent Guardian Ultra Console, go to Administration/Engine Manager/Supervising Engine, configure “If Database Error Happens, Fire Internal Event” to send Alert mail out when Database error occurs