KBI 310058 Excessive CPU Utilization or Paging in Virtual Machines
Version
All
Date
30 Apr 2008
Summary
High CPU Utilization rates when deploying Argent Main Engines or Daughter Engines to Virtual Machines is generally not an issue. As per the VMware Technical Network Whitepapers and VMware engineers CPU load or idle time measured within a virtual machine is not accurate in most cases.
Technical Background
Argent products should never run on virtual machines.
The following KBI makes one point about CPU.
There are a large number of other drawbacks to using VM or Hyper-V for production work; one other drawback of virtualization is the inability to consistently tune the physical resources, such as NIC cards and disk controller cache buffering optimization.
Moreover, many virtual implementations have less-than-optimal paging and swapping.
To guarantee consistently good performance, all Argent products should run on real machines.
Customers may notice when deploying Argent Main Engines or Daughter Engines to
Virtual Machines they see very high CPU Utilization rates.
This generally not an issue.
As per the VMware Technical Network Whitepapers, available on
time measurements in virtual machines are extremely difficult to accurately measure.
Often anomalous readings from counters such as “CPU Utilization” or
“Pages Per Second” are seen.
First, it is important to note that the virtual machine is actually a set of processes scheduled by the host
operating system (ESX) and each machine may receive a variable fraction of the host CPU or memory.
CPU load and usage measurements generally depend on extremely precise time measurements, and this
is enough of a distortion that the actual time measurement has a sizeable effect on
“jiffy counters”
within the VM itself.
Second, as per VMware engineers, it is not even clear exactly what it even means to measure CPU load and usage
within a single virtual machine. An excellent example of this is as follows:
Suppose a single virtual machine is running a process that repeatedly uses the CPU for one second and then
sleeps for another. While the process is sleeping, the guest Operating System has nothing to do other than
field timer interrupts. It spends almost all of its time in a halted state. If this were a real machine,
the process’s CPU utilization as well as load would obviously be 50%.
In a virtual machine, processes DE-SCHEDULE themselves whenever the guest Operating System halts and the process does not receive any physical CPU time until the halt
state ends. The process in this case receives 100% of the physical CPU time allocated to the virtual machine, though it still is only using 50% of the actual time.
This situation is even more complex if the host ESX server is heavily loaded. In that case, the process may receive far less than the 50% of a physical CPU.
Because of these issues, CPU load or idle time measured within a virtual machine is
literally meaningless in most cases.
When cases are observed where applications in a virtual machine seems to be using a great deal of overhead,
the ESX administrator should experiment – using trial and error – with the resource pools and other associated
resources to ensure the virtual machine guests are properly tuned.
There are a number of Rules of Thumb available off the internet to assist in trying to tune or optimize virtual environments. These tend to vary widely in quality
and some appear wrong in some areas.
Also review www.vmware.com
Resolution
N/A