KBI 311090 Issue Addressed: Running System Down Rule With NetRemoteTOD Option For Remote Machine Can Cause High CPU Usage In Windows 2012 Server

Version

Argent Advanced Technology 3.1A-1407-A or earlier

Date

Tuesday, 21 Oct 2014

Summary

If Argent AT is installed on Windows 2012 server, running System Down Rule with

NetRemoteTOD option for a remote machine with very long network latency can cause very high CPU usage

The issue has been addressed in Argent AT 3.1A-1407-T6

Technical Background

There is a timeout associated with NetRemoteTOD API

If the remote machine has a very long network latency, which can be confirmed by using trace route utility, the API may time out prematurely

When many of such timeout happen, somehow it can trigger a very high CPU usage. It has only been observed at Windows 2012 servers

Preliminary investigation shows Windows 2012 server handles the context switch in C++/CLI differently compared to previous Windows versions

Argent AT 3.1A-1407-T6 addressed the issue by moving NetRemoteTOD to a separate worker thread, and terminates the thread if timeout happens

Resolution

Upgrade to Argent AT 3.1A-1407-T6 or later

For customer who cannot upgrade immediately, he can either set the timeout longer for the specific node in License Manager, or use Relator option ‘Spawn New Monitoring Engine Process‘ to isolate the Monitoring Engine process