KBI 310204 Batch Tasks, TCP Statistics and Turbo Mode

Version

All

Date

24 Aug 2010

Summary

By default, all Supervising Engines communicate with Monitoring Engines in “Turbo Mode”.

Turbo Mode is controlled through a registry key on the Supervising Engine of each Product:
HKEY_LOCAL_MACHINE\SOFTWARE\Argent\ArgentGuardian\TURBO_MONITORING

(0 = OFF, 1 = ON)

Turbo Mode is a feature that allows the pooling of scheduled Relator tasks from the Supervising Engine to the Monitoring Engine.

This means, if we have 60 Relator tasks to schedule now, we don’t create 60 independent TCP connections to the Monitoring Engine, we pool the tasks into a two batches of 30 (default), and send them to the Monitoring Engine.

This feature significantly reduces load, bandwidth and potential TCP socket issues.

Technical Background

When Turbo mode is used, you may notice TCP Statistics printed out in the Supervising Engine logs every five minutes.


04 Aug 2010 14:38:01.030 SVR1 admin ==================== TCP Statistics ========================

04 Aug 2010 14:38:01.046 SVR1 admin ==================== TCP Statistics ========================

04 Aug 2010 14:38:01.046 SVR1 admin ==================== TCP Statistics ========================

04 Aug 2010 14:38:01.046 SVR1 admin Task Submission Mode        ===> Turbo (30)

04 Aug 2010 14:38:01.046 SVR1 admin Task Monitor Mode           ===> Turbo (30)

04 Aug 2010 14:38:01.061 SVR1 admin Task Submission Loop Time   ===> Average: 0.15, Maximum: 0.31, Minimum: 0.05

04 Aug 2010 14:38:01.061 SVR1 admin Task Submission Packet (KB) ===> Average: 0.66, Maximum: 2.81, Minimum: 0.39

04 Aug 2010 14:38:01.061 SVR1 admin Task Monitor Loop Time      ===> Average: 0.12, Maximum: 0.24, Minimum: 0.03

04 Aug 2010 14:38:01.061 SVR1 admin Task Monitor Packet (KB)    ===> Average: 2.50, Maximum: 8.13, Minimum: 0.44

04 Aug 2010 14:38:01.077 SVR1 admin ==================== TCP Statistics ========================

04 Aug 2010 14:38:01.077 SVR1 admin ==================== TCP Statistics ========================

04 Aug 2010 14:38:01.077 SVR1 admin ==================== TCP Statistics ========================

These statistics are extremely useful as a debugging tool.

Here is an explanation of what each line means — we will use a scenario where the Supervising Engine calculates that it needs to send 300 tasks to a Monitoring Engine:

Task Submission Mode

This is how many tasks to submit to the Monitoring Engine for execution in a single request batch. The default is 30.

This value is controlled on the Supervising Engine’s registry for each product, e.g. Argent Guardian:
HKEY_LOCAL_MACHINE\SOFTWARE\Argent\ArgentGuardian\TASK_SUBMIT_BATCH_SIZE

Task Monitor Mode

A Supervising Engine periodically does a status check on the executed tasks on a Monitoring Engine (this is used to update the History screen to see if the Relator is pending, failed, completed, etc.) The task status check is conducted every 10 seconds by default.

This statistic determines how many tasks to do a status check for in a single request batch. The default is 30.

This value is controlled on the Supervising Engine’s registry for each product, e.g. Argent Guardian:
HKEY_LOCAL_MACHINE\SOFTWARE\Argent\ArgentGuardian\TASK_INFO_BATCH_SIZE

Task Submission/Monitor Loop Time

We’ve established from the above that “submission” = request to execute task, and “monitor” = request to get task status info. Now we can explain what “Loop Time” means in both cases.

If we have 300 total tasks to execute, and if the batch size is 30, we end up with 10 sets of batches that run concurrently. This is the definition of a “loop”, and this statistic is actually an aggregation of the past five minutes. (Remember: these statistics are written every five minutes)

In other words, the Average time is the average number of seconds it took to complete a loop in the past five minutes.

The Maximum time is the maximum number of seconds for the slowest loop to complete in the past five minutes.

The Minimum is the fastest time in seconds that a loop completed in the past five minutes.

Task Submission/Monitor Packet

Similar to above, but this shows the statistics for the size of the packet in each loop in the past five minutes.

This is useful for checking if the batch size setting is suitable for the link between the Supervising Engine and the Monitoring Engine.

If the link is a remote or slow link, and if the packet size is, say 2,000 KB — then this may cause an issue where the packet is too large, and easily dropped or corrupted along the route from the Supervising Engine to the remote Monitoring Engine.

In most cases, e.g. Daughter Engines, the Supervising Engine and Monitoring Engine is local to each other (installed on the same machine) — in this case, the packet size does not pose a major problem, unless it is unusually large, like 10,000 KB in size.

Analyzing the TCP Statistics

The Average and Maximum statistics are the only statistics that need to be taken in for consideration.

A large number of seconds for “Task Submission Loop Time” time means that the last Relator to run in the loop is being delayed.

A large number of seconds for “Task Monitor Loop Time” means that the task status checking is being delayed, which is actually the more severe of the two, as this affects and delays the Supervising Engine from sending Alerts to the Argent Console.

To help alleviate this, you can change the value of the “Task Monitor” batch size from 30 to a larger value, such as 60 or 100:
HKEY_LOCAL_MACHINE\SOFTWARE\Argent\ArgentGuardian\TASK_INFO_BATCH_SIZE

Here we have a trade-off:

In general, larger batch size = faster loop.

If the batch size increases, this means we have less batches to send over, which means there is a lot less TCP socket overhead (e.g. time and CPU usage) when initiating the batches.

However, by increasing the batch size, the packet size increases, which can lead to a state where no monitoring is done, as the packets are dropped, rejected, or corrupted along the way.

Therefore, a value under 100 is recommended for the “Task Monitor” batch size. Argent also does not recommend ever turning Turbo Mode off.

Resolution

N/A