KBI 311657 Issue Addressed: Missing Alerts In Exchange Monitoring

Version

Argent Advanced Technology 5.1A-1804-A and below

Date

Tuesday, 24 April 2018

Summary

This article describes a symptom of missing alerts in Exchange monitoring

The scenario applies to Mother Daughter architecture

Issue was found related Argent for Exchange Service Restart Every 10 Minutes

Argent for Exchange service was found ‘Running’ in Windows Service Console

All alerts are missing from Argent for Exchange

Supervising Engine Log of Argent for Exchange show consistent restart every 10 minutes

Error was captured, for example

01 Apr 2018 00:10:04.222 ArgentServer DOMAIN\ServiceAccount Could not process uploaded file

‘C:\Argent\ArgentForExchange\UPLOAD\PERF_DATA_DaughterEngine_2014_07_08_22_35_04.DAT’. Leave it to next cycle

01 Apr 2018 00:10:04.222 ArgentServer DOMAIN\ServiceAccount >>>> Unhandled Exception happened in Thread 0X0 (PerfDataCache.cpp, 2424)

01 Apr 2018 00:10:04.222 ArgentServer DOMAIN\ServiceAccount Internal trace is dumped to ‘EX_SVC_LOG_EXCEPTION_TRC.TXT’

01 Apr 2018 00:10:05.236 ArgentServer DOMAIN\ServiceAccount WARNING: DAL has detected that the maximum number of ODBC sessions has been reached. (Current: 48, Preferred: 30)

01 Apr 2018 00:10:06.905 ArgentServer DOMAIN\ServiceAccount Failed to see checkpoint event of daughter process ‘ARGSOFT_EX_MAIN.EXE’. Restart the process

01 Apr 2018 00:10:06.983 ArgentServer DOMAIN\ServiceAccount STARTED service daughter process ‘ARGSOFT_EX_MAIN.EXE /PPID=1460 /RESTART’ (Assigned to Job Object)

01 Apr 2018 00:10:10.431 ArgentServer DOMAIN\ServiceAccount DISCARDED INTERNAL EVENT (Main Engine Issue): Argent for Exchange Supervising Engine (Motor) main process ended unexpectedly. It restarts at 01 Apr 2018 00:10:10

There are multiple files in UPLOAD folder dated very old under Argent for Exchange

For example, C:\Argent\ArgentForExchange\UPLOAD

Technical Background

In Mother/Daughter architecture, Daughter Engine uploads Argent Predictor Data to Mother Engine as these DAT files

Mother Engine de-serializes the DAT file before saving into database

In some rare cases, the DAT file could be corrupted

Mother Engine leaks database connection resource when it happens

To make things worse, Mother Engine leaves the DAT file as unprocessed

In next loop, Mother Engine leaks more resource

As a result, SQL Server slows as outstanding connections accumulate

When Mother Engine becomes so slow, the service control process will restart the service main process

However, as the bad DAT files still persist in UPLOAD directory, the same sequence will happen over again

The issue is addressed in Argent Advanced Technology 5.1A-1804-B

Mother Engine can detect the corrupted DAT files, log and move them out of UPLOAD directory

As a result, the errors won’t accumulate and Engine stays running healthily

Resolution

Upgrade to Argent Advanced Technology 5.1A-1804-B

For customer who cannot upgrade immediately, he can do following:

  1. Stop Argent for Exchange service
  2. Move UPLOAD to OLD_UPLOAD
  3. Restart Argent for Exchange service

* repeat on all Motors, when applicable