KBI 310351 Reducing Argent AT Task Skipping Rate

Version

Argent Advanced Technology – All versions

Date

8 Mar 2013

Summary

Customers of large installation frequently runs into issues of high task skip rate.

Technical Background

Customers can notice two things:

  1. Task is executed in the interval longer than the scheduled. This can be identified on ‘Scheduled Monitoring Task‘ screen.
  2. The 5-minute statistics in service log shows high skip rate.

    Though customers can always assign so many tasks as to overload the system, they can configure AT system efficiently to at least alleviate the situation.

  3. Resolution

    Customers can take following actions to either reduce or completely eliminate the skipping rate:

    1. Reduce product internal looping interval. This can be done by adjust registry HKLM\Software\{PRODUCT}\DEFAULT_LOOP_INTERVAL and LONG_LOOP_INTERVAL to smaller numbers.

      DEFAULT_LOOP_INTERVAL can take value in range of (3, 30), while LONG_LOOP_INTERVAL in range of (6, 60).

      For typical sizable installation, set value to 5 and 10 respectively.

    2. Configure Shared Pool Size to 10-15 and RELATORs using dynamic pool.

      By using dynamic pool, the overhead of process spawning is avoided, but it can cause potential interference between tasks.

      Customers should pay extra attention to Windows Performance rule. If monitored nodes scatters across large network, it will be safer to run Windows Performance rule in isolated monitoring engine process. It has been reported that one down machine can cause performance checking fail on other nodes too if PDH APIs are called within the same process.

    3. For AT 1301-T4 and later, configure registry HKLM\Software\{PRODUCT}\ RUN_ISOLATE_DOWN_RULE to 0.

      This allows System Down Rule and SLA rule checked within Supervising Engine process. The effects on performance is dramatic considering System Down rule and SLA rule account for a considerable portion of monitoring tasks.