KBI 312179 Numerous ‘Task Cancelled’ Messages Because Of Unreachable Nodes

Version

Argent Omega 2.2A-2404-A and later

Date

Friday, 14 June 2024

Summary

Administrators can receive email messages with subject line ‘Task Cancelled’. The message body reads as


Task (Relator: xxxx Machine: xxxx) is cancelled because previous instance has been running too long and unresponsive


The Relator trace log, it shows either failed to get performance metrics or could not connect to Service Control Manager (SCM) etc.

Then it is confirmed that the node is offline.

It is best to set Relator option ‘Do Not Apply Rules In This Relator To An Inaccessible Node’ for Relators checking Windows Performance metrics or service status.

Technical Background

The Windows API has very long timeout when reading performance metrics or connecting to SCM. The timeout is not adjustable from API calls. Argent Omega has built-in 5-minute timeout if no response is received. This causes tasks to be cancelled for inaccessible nodes.

Resolution

Upgrade to Argent Omega 2.2A-2404-A or later.

Argent Omega CLI has implemented a new feature to turn on option ‘Do Not Apply Rules In This Relator To An Inaccessible Node’ for a batch of Relators.

The CLI command is as follows:


ArgentOmegaCLI.exe /relator [/name:xxxx] [/folder] /not_test_offline_node /Argent_Password:xxxx


Note:

  • Only Relators that do not run System Down Rule and SLA Rule are affected.
  • If ‘/folder’ option is used, the name specified in option ‘/name’ is name of folder that Relators reside in.
  • Wildcards are supported for the name specified in option ‘/name’.
  • Option ‘/name’ is optional. Without specifying name, all Relators are updated.

Example:


ArgentOmegaCLI.exe /relator /not_test_offline_node /Argent_Password:ArgentOmega


By running this command, all Relators that do not run System Down Rule or SLA Rule are updated to use option.