KBI 311401 Issue Addressed: Argent AT Engine Running On Single CPU VMware Virtual Machine Consumes Near 100% CPU
Version
Argent Advanced Technology 3.1A-1601-C or earlier
Date
Wednesday, 25 May 2016
Summary
When Argent AT Engine is installed on Single CPU VMware virtual machine, it can consume near 100% CPU, and server becomes unresponsive
The service log repeatedly shows lines similar to following:
The issue has been addressed in Argent AT 3.1A-1601-T8
Technical Background
VMware VM CPU resource is shared among VM’s
A single virtual CPU is generally not very powerful
If Argent AT Engine deals with a lot of load, the Argent AT scheduling Engine can lag behind
The scheduling Engine will try to loop as quickly as possible to catch up
As a result, fewer CPU cycles are available for creating Argent AT monitoring Engine processes
The condition of late tasks deteriorates even further
The bad feedback loop can spiral out of control
In the end, the CPU runs constantly around 100% while no task is actually completed successfully
This condition might be just result of sheer amount of production Relators and monitored Nodes, but it is also possible caused by less efficient configuration
For example, Argent AT Engine usually runs System Down Rules or SLA Rules against many server/devices with high frequency
If the Relator uses option ‘Spawn New Monitoring Engine Process’, a lot of processes will be created and terminated constantly
Creating process is a very expensive operation for CPU
When there is only single VM CPU, the VM can be easily overwhelmed
Argent AT 3.1A-1601-T8 addresses the issue by throttling scheduling Engine in the case that CPU usage is high
Resolution
Upgrade to Argent AT 3.1A-1601-T8 or later
For customer who cannot upgrade immediately, he can add a CPU to VMware VM, and check the efficiency of Argent AT configuration
For example, make sure Relator option ‘Use Shared Monitored Engine Process’ is used instead of ‘Spawn New Monitoring Engine Process’