KBI 311124 Relator Not Running And/Or Pending Alert Has Thousands Of Files
Version
Argent AT All versions
Date
Wednesday, 3 Dec 2014
Summary
Argent Console service stops after a SQL Server outage after more than allowed timeout period, by default 3 minutes, same for other Argent products, but say if Argent Guardian Ultra has a SQL Server timeout of 60 minutes and the service remains running it may appear everything is fine
Subsequently the PENDING_EVENT directory may fill (if other Argent services are still running) due to not being able to process Events, and then causing the Argent Guardian Ultra service to stall upon a service restart or maintenance recycle
Technical Background
Upon a SQL Server outage the Argent Console service (and potentially other Argent services) stops after the SQL Server timeout threshold set in the Windows Registry
If the Argent Console service stops and any other Argent services continue,
the <DRIVE>:\ARGENT\{PRODUCT}\PENDING_EVENTS directory may fill
The PENDING_EVENTS directory is used to temporarily store as a file for Events (Alerting via the Argent Console) and Argent Predictor Metrics
These files process first-in-first-out (FIFO), and therefore if the Argent Console service is stopped the other Argent services cannot process beyond that point the directory appears to fill with files
Once this PENDING_EVENTS directory gets to a critical mass (depending on system performance) taking longer than 5 minutes to process ‘Failed to see check point Event’, an internal recycle of the process is triggered
Causing the Argent service to never correctly start and Scheduled Tasks do not schedule
Resolution
- Ensure ALL Argent services are up and running after a SQL Server outage
- Under Registry key HKLM\SOFTWARE\WOW6432NODE\ARGENT\ARGENT_CONSOLE,
change the SQL_OUTAGE_ALLOWED_IN_MINUTE value to a larger value, e.g. 60 andensure the SQL_OUTAGE_ALLOWED_IN_MINUTE value across all Argent products are consistent
- Under Argent Guardian Ultra Console, go to Administration/Engine Manager/Supervising Engine, configure “If Database Error Happens, Fire Internal Event” to send Alert mail out when Database error occurs