KBI 310516 Argent AT Skipped Tasks and Performance Troubleshooting
Version
Argent AT — all versions
Date
11 Jun 2013
Summary
Alerts are not firing, or a high and consistent percentage of late/skipped/pending scheduled tasks exists
Technical Background
The following symptoms may suggest that some fine-tuning is needed on the configuration of Argent on your Main Engine:
1. ‘Pending’ or ‘Retry’ in the Scheduled Monitoring Tasks screen
2. Argent Service Logs contains ‘Relator has taken too long to run…’
3. Search for ‘Skipped:’ in the service log — look for very high skip rates, e.g. 98% skipped, 2% submitted
4. Alerts do not fire if a Rule is broken
Skipped Tasks
It is critical that the scope of the skipped tasks are well understood:
Whether the skipped tasks are spread out across all components (Main Engine and Daughter Engine) vs. skipped tasks coming from a single problematic machine provides us with a clear debugging path
To obtain this information:
Step 1: Open the SQL Management Studio
Step 2: Save/Export the ARGSOFT_{Product}_EXECJOBS and ARGSOFT_{Product}_JOBWATERMARK tables as .CSV files
Step 3: Run the following query (e.g. for Argent Guardian Ultra):
SELECT T1.EXECUTETIME, T1.TE, T1.SUBMITTER, T1.TASKSTATUS, T2.RELATOR, T2.SERVER FROM ARGSOFT_ARGENT_GUARDIAN_ULTRA_EXECJOBS T1 INNER JOIN ARGSOFT_ARGENT_GUARDIAN_ULTRA_JOBWATERMARK T2 ON T1.JOBPCKEY = T2.UUID WHERE T1.TASKSTATUS IN (3,4,6) ORDER BY T1.EXECUTETIME
A TASKSTATUS of ‘3’ means ‘Error’, ‘4’ means ‘Skipped’, ‘6’ means ‘Cancelled’. All other values can be ignored
Step 4: Save the results of the query above
Step 5: Send the information above to Argent
Resolution
1. Most Argent performance issues are Operating System-dependent (Heap size limiting allowable processes in the Task Manager, etc.). Ensure the Relator configuration is set to use ‘Dynamic Scheduling’ in the ‘When To Run’ tab for Relator definitions
2. If a customer has Daughter Engines, make sure the monitoring tasks are correctly assigned to Daughters, especially if ‘{Dynamic}’ is used for Monitoring Engine assignments in the ‘What to Run’ tab for Relators. The default Monitoring Engine in the Network Group setting MUST be correctly set, or else tasks won’t run
3. If a customer can add Non-Stop Motor Engines, plan on moving over to Non-Stop Motors to distribute the load evenly (although more Daughter Engines are also sufficient if performance is a major concern rather than high availability)
4. Use dynamic pool instead of Spawn New Monitoring Engine Process to reduce load (Start with the most frequent Relators)
5. Increase the size of the dynamic pool from 3 to 15 — see https://help.argent.com/#KBI_310332
6. Set the registry ‘RUN_DOWN_RULE_THREAD_LIMIT’ to 256 to move SLA and System Down Rules into the Supervising Engine — see https://help.argent.com/#KBI_310388
Extra:
* Keep an eye on the WO folder and observe files — numerous files in the folder may affect product performance (this is being redesigned)
* Enabling SQL Bulk Inserts improves performance especially where a large number of Argent Predictor data is being saved — see https://help.argent.com/knowledge-base/kbi-310359-enabling-sql-bulk-insert-in-argent-guardian-ultra-and-argent-for-compliance/kbi-310359-enabling-sql-bulk-insert-in-argent-guardian-ultra-and-argent-for-compliance/
* Lots of ‘Retry’ may also indicate the ‘Remote Registry’ service needs to be restarted on the target system