KBI 310516 Argent AT Skipped Tasks and Performance Troubleshooting

Version

Argent AT — all versions

Date

11 Jun 2013

Summary

Alerts are not firing, or a high and consistent percentage of late/skipped/pending scheduled tasks exists

Technical Background

The following symptoms may suggest that some fine-tuning is needed on the configuration of Argent on your Main Engine:

1. ‘Pending’ or ‘Retry’ in the Scheduled Monitoring Tasks screen

2. Argent Service Logs contains ‘Relator has taken too long to run…’

3. Search for ‘Skipped:’ in the service log — look for very high skip rates, e.g. 98% skipped, 2% submitted

4. Alerts do not fire if a Rule is broken

Skipped Tasks

It is critical that the scope of the skipped tasks are well understood:

Whether the skipped tasks are spread out across all components (Main Engine and Daughter Engine) vs. skipped tasks coming from a single problematic machine provides us with a clear debugging path

To obtain this information:

Step 1: Open the SQL Management Studio

Step 2: Save/Export the ARGSOFT_{Product}_EXECJOBS and ARGSOFT_{Product}_JOBWATERMARK tables as .CSV files

Step 3: Run the following query (e.g. for Argent Guardian Ultra):


SELECT T1.EXECUTETIME, T1.TE, T1.SUBMITTER, T1.TASKSTATUS, T2.RELATOR, T2.SERVER

FROM ARGSOFT_ARGENT_GUARDIAN_ULTRA_EXECJOBS T1

INNER JOIN ARGSOFT_ARGENT_GUARDIAN_ULTRA_JOBWATERMARK T2

ON T1.JOBPCKEY = T2.UUID

WHERE T1.TASKSTATUS IN (3,4,6)

ORDER BY T1.EXECUTETIME

A TASKSTATUS of ‘3’ means ‘Error’, ‘4’ means ‘Skipped’, ‘6’ means ‘Cancelled’. All other values can be ignored

Step 4: Save the results of the query above

Step 5: Send the information above to Argent

Resolution

1. Most Argent performance issues are Operating System-dependent (Heap size limiting allowable processes in the Task Manager, etc.). Ensure the Relator configuration is set to use ‘Dynamic Scheduling’ in the ‘When To Run’ tab for Relator definitions

2. If a customer has Daughter Engines, make sure the monitoring tasks are correctly assigned to Daughters, especially if ‘{Dynamic}’ is used for Monitoring Engine assignments in the ‘What to Run’ tab for Relators. The default Monitoring Engine in the Network Group setting MUST be correctly set, or else tasks won’t run

3. If a customer can add Non-Stop Motor Engines, plan on moving over to Non-Stop Motors to distribute the load evenly (although more Daughter Engines are also sufficient if performance is a major concern rather than high availability)

4. Use dynamic pool instead of Spawn New Monitoring Engine Process to reduce load (Start with the most frequent Relators)

5. Increase the size of the dynamic pool from 3 to 15 — see https://help.argent.com/#KBI_310332

6. Set the registry ‘RUN_DOWN_RULE_THREAD_LIMIT’ to 256 to move SLA and System Down Rules into the Supervising Engine — see https://help.argent.com/#KBI_310388

Extra:

* Keep an eye on the WO folder and observe files — numerous files in the folder may affect product performance (this is being redesigned)

* Enabling SQL Bulk Inserts improves performance especially where a large number of Argent Predictor data is being saved — see https://help.argent.com/knowledge-base/kbi-310359-enabling-sql-bulk-insert-in-argent-guardian-ultra-and-argent-for-compliance/kbi-310359-enabling-sql-bulk-insert-in-argent-guardian-ultra-and-argent-for-compliance/

* Lots of ‘Retry’ may also indicate the ‘Remote Registry’ service needs to be restarted on the target system