KBI 311657 Issue Addressed: Missing Alerts In Exchange Monitoring
Version
Argent Advanced Technology 5.1A-1804-A and below
Date
Tuesday, 24 April 2018
Summary
This article describes a symptom of missing alerts in Exchange monitoring
The scenario applies to Mother Daughter architecture
Issue was found related Argent for Exchange Service Restart Every 10 Minutes
Argent for Exchange service was found ‘Running’ in Windows Service Console
All alerts are missing from Argent for Exchange
Supervising Engine Log of Argent for Exchange show consistent restart every 10 minutes
Error was captured, for example
01 Apr 2018 00:10:04.222 ArgentServer DOMAIN\ServiceAccount Could not process uploaded file
‘C:\Argent\ArgentForExchange\UPLOAD\PERF_DATA_DaughterEngine_2014_07_08_22_35_04.DAT’. Leave it to next cycle
01 Apr 2018 00:10:04.222 ArgentServer DOMAIN\ServiceAccount >>>> Unhandled Exception happened in Thread 0X0 (PerfDataCache.cpp, 2424)
01 Apr 2018 00:10:04.222 ArgentServer DOMAIN\ServiceAccount Internal trace is dumped to ‘EX_SVC_LOG_EXCEPTION_TRC.TXT’
01 Apr 2018 00:10:05.236 ArgentServer DOMAIN\ServiceAccount WARNING: DAL has detected that the maximum number of ODBC sessions has been reached. (Current: 48, Preferred: 30)
01 Apr 2018 00:10:06.905 ArgentServer DOMAIN\ServiceAccount Failed to see checkpoint event of daughter process ‘ARGSOFT_EX_MAIN.EXE’. Restart the process
01 Apr 2018 00:10:06.983 ArgentServer DOMAIN\ServiceAccount STARTED service daughter process ‘ARGSOFT_EX_MAIN.EXE /PPID=1460 /RESTART’ (Assigned to Job Object)
01 Apr 2018 00:10:10.431 ArgentServer DOMAIN\ServiceAccount DISCARDED INTERNAL EVENT (Main Engine Issue): Argent for Exchange Supervising Engine (Motor) main process ended unexpectedly. It restarts at 01 Apr 2018 00:10:10
There are multiple files in UPLOAD folder dated very old under Argent for Exchange
For example, C:\Argent\ArgentForExchange\UPLOAD
Technical Background
In Mother/Daughter architecture, Daughter Engine uploads Argent Predictor Data to Mother Engine as these DAT files
Mother Engine de-serializes the DAT file before saving into database
In some rare cases, the DAT file could be corrupted
Mother Engine leaks database connection resource when it happens
To make things worse, Mother Engine leaves the DAT file as unprocessed
In next loop, Mother Engine leaks more resource
As a result, SQL Server slows as outstanding connections accumulate
When Mother Engine becomes so slow, the service control process will restart the service main process
However, as the bad DAT files still persist in UPLOAD directory, the same sequence will happen over again
The issue is addressed in Argent Advanced Technology 5.1A-1804-B
Mother Engine can detect the corrupted DAT files, log and move them out of UPLOAD directory
As a result, the errors won’t accumulate and Engine stays running healthily
Resolution
Upgrade to Argent Advanced Technology 5.1A-1804-B
For customer who cannot upgrade immediately, he can do following:
- Stop Argent for Exchange service
- Move UPLOAD to OLD_UPLOAD
- Restart Argent for Exchange service
* repeat on all Motors, when applicable