KBI 312179 Numerous ‘Task Cancelled’ Messages Because Of Unreachable Nodes
Version
Argent Omega 2.2A-2404-A and later
Date
Friday, 14 June 2024
Summary
Administrators can receive email messages with subject line ‘Task Cancelled’. The message body reads as
Task (Relator: xxxx Machine: xxxx) is cancelled because previous instance has been running too long and unresponsive
The Relator trace log, it shows either failed to get performance metrics or could not connect to Service Control Manager (SCM) etc.
Then it is confirmed that the node is offline.
It is best to set Relator option ‘Do Not Apply Rules In This Relator To An Inaccessible Node’ for Relators checking Windows Performance metrics or service status.
Technical Background
The Windows API has very long timeout when reading performance metrics or connecting to SCM. The timeout is not adjustable from API calls. Argent Omega has built-in 5-minute timeout if no response is received. This causes tasks to be cancelled for inaccessible nodes.
Resolution
Upgrade to Argent Omega 2.2A-2404-A or later.
Argent Omega CLI has implemented a new feature to turn on option ‘Do Not Apply Rules In This Relator To An Inaccessible Node’ for a batch of Relators.
The CLI command is as follows:
ArgentOmegaCLI.exe /relator [/name:xxxx] [/folder] /not_test_offline_node /Argent_Password:xxxx
Note:
- Only Relators that do not run System Down Rule and SLA Rule are affected.
- If ‘/folder’ option is used, the name specified in option ‘/name’ is name of folder that Relators reside in.
- Wildcards are supported for the name specified in option ‘/name’.
- Option ‘/name’ is optional. Without specifying name, all Relators are updated.
Example:
ArgentOmegaCLI.exe /relator /not_test_offline_node /Argent_Password:ArgentOmega
By running this command, all Relators that do not run System Down Rule or SLA Rule are updated to use option.