KBI 311227 Issue Addressed: Extreme High CPU Usage Due To More Than 100 Long Running Monitoring Engine Processes

Version

Argent Advanced Technology 3.1A-1504 and earlier

Date

Wednesday, 3 June 2015

Summary

Argent AT Engine may experience high CPU usage close to 100%

As a result, the Engine becomes very sluggish and hard to control

When task manager is brought up, it is obvious that it is caused by more than 100 Monitoring Engine processes

The processes can be running for more than 10 minutes, and keep accumulating

When so many Monitoring Engine processes are running, system resource can be depleted

The heavy paging causes system even busier

The Monitoring Engine processes are the isolated processes generated by Relator using the option ‘Spawn New Monitoring Engine Process’

They are typically reading performance metrics from remote servers

Argent AT 3.1A-1504-T1 is enhanced to limit the maximum isolated Monitoring Engine processes

It is controlled by registry HKLM\Software\Argent\{PRODUCT}\ MAX_RUNNING_ISOLATE_ME_PROCESS

The default value is 30

It takes value from 0 to 1000

The throttling is disabled if the value is 0

Technical Background

It is desirable to use option ‘Spawn New Monitoring Engine Process’ when monitoring performance counters of remote servers

(See KBI 311056 Issue Addressed: Windows Performance Rule Failed With PDH Error 0x102 When Using Dynamic Pool)

It is not uncommon to take around 5 minutes to read performance data from a remote server with 10 hops away

However, when more than 100 Monitoring Engine processes are spawned, system paging can become so severe that the processes run even slower

Argent AT Engine spawns Monitoring Engine process based on Relator schedule

While each process stay in process space longer, more and more processes are accumulated

The vicious circle can spin out of control

With this new feature, when the limit of such isolated Monitoring Engine process is exceeded, monitoring tasks are delayed until some of running processes complete

However, because the Monitoring Engine processes run faster, the total system throughput is actually much improved

Resolution

Upgrade to Argent AT 3.1A-1504-T1 or later