KBI 311401 Issue Addressed: Argent AT Engine Running On Single CPU VMware Virtual Machine Consumes Near 100% CPU

Version

Argent Advanced Technology 3.1A-1601-C or earlier

Date

Wednesday, 25 May 2016

Summary

When Argent AT Engine is installed on Single CPU VMware virtual machine, it can consume near 100% CPU, and server becomes unresponsive

The service log repeatedly shows lines similar to following:

The issue has been addressed in Argent AT 3.1A-1601-T8

Technical Background

VMware VM CPU resource is shared among VM’s

A single virtual CPU is generally not very powerful

If Argent AT Engine deals with a lot of load, the Argent AT scheduling Engine can lag behind

The scheduling Engine will try to loop as quickly as possible to catch up

As a result, fewer CPU cycles are available for creating Argent AT monitoring Engine processes

The condition of late tasks deteriorates even further

The bad feedback loop can spiral out of control

In the end, the CPU runs constantly around 100% while no task is actually completed successfully

This condition might be just result of sheer amount of production Relators and monitored Nodes, but it is also possible caused by less efficient configuration

For example, Argent AT Engine usually runs System Down Rules or SLA Rules against many server/devices with high frequency

If the Relator uses option ‘Spawn New Monitoring Engine Process’, a lot of processes will be created and terminated constantly

Creating process is a very expensive operation for CPU

When there is only single VM CPU, the VM can be easily overwhelmed

Argent AT 3.1A-1601-T8 addresses the issue by throttling scheduling Engine in the case that CPU usage is high

Resolution

Upgrade to Argent AT 3.1A-1601-T8 or later

For customer who cannot upgrade immediately, he can add a CPU to VMware VM, and check the efficiency of Argent AT configuration

For example, make sure Relator option ‘Use Shared Monitored Engine Process’ is used instead of ‘Spawn New Monitoring Engine Process’