KBI 311890 Issue Addressed: Event Is Not Automatically Resolved In Argent Non-Stop Monitor Environment In Certain Situations

Version

Argent Advanced Technology 5.1A-2010-C or earlier

Date

Tuesday, 24 Nov 2020

Summary

It is confirmed that outstanding events could fail to be resolved automatically in certain Argent Non-Stop Monitoring environments. It is not an issue in standalone configuration. It happens occasionally and is tied to system load.

The issue has been addressed in Argent AT 5.1A-2010-D and later.

Technical Background

When Rule is broken, Argent motor engine sends an event firing request to Argent Console engine. As event firing is handled asynchronously, Argent Console engine returns a unique job number, and Argent motor engine queries using this job number until event firing request is handled.

Even if the event is not fired, Argent motor has no further involvement with this event request.

If event is fired and auto resolution is configured in Relator, Argent motor insert a recheck record in database for later reference. If Rule is no longer broken after some time, and Argent motor finds the recheck record, Argent motor will send an auto resolution request to Argent Console. As a result, the outstanding event will be resolved automatically.

An issue can happen when acquiring the job number. In Argent Non-Stop Monitor environment, in order to avoid competing conditions between motors, Argent motor engine must acquire cluster lock first. Acquiring cluster lock is a DB operation. It is not very fast compared to in-memory operations.

Argent motor fires event requests concurrently. If there happen to have a lot of events to fire at a particularly moment, there will be the same amount of worker threads trying to acquire cluster lock before assigning new job numbers. DB operations can fail with ?Update or delete failed?. As a result, the same job number could be given to different event requests.

Say there are two event requests acquired the same job number. One event request results in event firing, the other not firing. Argent motor engine won?t configure recheck record in database as it sees event is not fired. As a result, automatic resolution won?t happen when the condition is corrected.

Argent AT 5.1A-2010-D has been enhanced to address the issue. Argent Console engine now uses Critical Section to make sure only one worker thread is acquiring cluster lock and bumping up value of used job number. This pretty much eliminates the DB competition condition. When cluster lock is correctly acquired, the uniqueness of job number can be guaranteed.

Resolution

Upgrade to Argent AT 5.1A-2010-D or later.