KBI 311227 Issue Addressed: Extreme High CPU Usage Due To More Than 100 Long Running Monitoring Engine Processes
Version
Argent Advanced Technology 3.1A-1504 and earlier
Date
Wednesday, 3 June 2015
Summary
Argent AT Engine may experience high CPU usage close to 100%
As a result, the Engine becomes very sluggish and hard to control
When task manager is brought up, it is obvious that it is caused by more than 100 Monitoring Engine processes
The processes can be running for more than 10 minutes, and keep accumulating
When so many Monitoring Engine processes are running, system resource can be depleted
The heavy paging causes system even busier
The Monitoring Engine processes are the isolated processes generated by Relator using the option ‘Spawn New Monitoring Engine Process’
They are typically reading performance metrics from remote servers
Argent AT 3.1A-1504-T1 is enhanced to limit the maximum isolated Monitoring Engine processes
It is controlled by registry HKLM\Software\Argent\{PRODUCT}\ MAX_RUNNING_ISOLATE_ME_PROCESS
The default value is 30
It takes value from 0 to 1000
The throttling is disabled if the value is 0
Technical Background
It is desirable to use option ‘Spawn New Monitoring Engine Process’ when monitoring performance counters of remote servers
It is not uncommon to take around 5 minutes to read performance data from a remote server with 10 hops away
However, when more than 100 Monitoring Engine processes are spawned, system paging can become so severe that the processes run even slower
Argent AT Engine spawns Monitoring Engine process based on Relator schedule
While each process stay in process space longer, more and more processes are accumulated
The vicious circle can spin out of control
With this new feature, when the limit of such isolated Monitoring Engine process is exceeded, monitoring tasks are delayed until some of running processes complete
However, because the Monitoring Engine processes run faster, the total system throughput is actually much improved
Resolution
Upgrade to Argent AT 3.1A-1504-T1 or later