KBI 311772 Issue Addressed: SLA Tasks Are Not Running At Specified Interval For Large Installation With More Than 1,000 Server/Devices

Version

Argent Advanced Technology 5.1A-1904-C and below

Date

Thursday, 22 August 2019

Summary

For large installation with more than 1,000, if system is configured to run SLA or System Down Rule with high frequency of interval less than 2 minutes, user could notice Relators are running with increased interval over time

In other words, monitoring tasks are running slower and slower

The issue can be temporarily rectified by truncating SQL table ARGSOFT_{PRODUCT}_EXECJOBS

But as the table row count increasing, the performance will gradually go down

The issue has been addressed in Argent AT 5.1A-1907-A

Technical Background

SQL table ARGSOFT_{PRODUCT}_EXECJOBS holds the information of each execution instance of Relator and Server/Device combination

When SLA or System Down Rule is executed for a lot of Server/Devices with high frequency, the row count can grow very quickly

While Argent AT Engine updates the SQL table when instance status changes, ex. instance status is changed from ‘Running’ to ‘Ended’ when execution has completed

When Engine update the status, it scans for rows of any orphan instance for the same Relator and Service/Device combination

If the table grows very big, the scanning becomes time consuming

This is the cause of worsening performance over time

Resolution

Upgrade to Argent Advanced Technology 5.1A-1907-A or above

For customer who could not upgrade immediately, he can address the issue by manually adding a SQL index for column ‘JOBPCKEY’ for SQL table ARGSOFT_{PRODUCT}_EXECJOBS

Click For Full Size