KBI 310088 Concurrency Issue With Prerequisite Unix Rules

Version

Up to and including Argent Oracle Monitor 8.0A-0707-D

Date

11 Feb 2008

Summary

Concurrency issue with prerequisite Unix Rules.

Technical Background

Configuring a Relator to use Prerequisites (“Relator Won’t Run On Node Unless All Following Rules Pass“) results in the prerequisite rule being executed simultaneously, one for each server in the original Monitoring Groups.

If this rule is executed against a specific node (instead of “The Same Server“), these rule executions happen concurrently against the same server.

This configuration exposes an issue in the concurrent execution of Unix Rule scripts.

The path used for a Unix script uses the rule name and a timestamp, granular to the nearest second.

This results in an identically-named script when the same rule is executed within the same second.

The two simultaneous executions then conflict with one another, as the sequence of { secure-copy, execute, delete } assumes that the script is not being used by other clients.

The Monitoring Engine log highlights this issue:

06 Feb 2008 16:09:03.777 W2-MON-1 argent argent argent’s Password:

06 Feb 2008 16:09:03.949 W2-MON-1 argent argent argent’s Password:

06 Feb 2008 16:09:04.167 W2-MON-1 argent argent@trsprd01:/home/argent > rm -f /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:04.386 W2-MON-1 argent argent@trsprd01:/home/argent > rm -f /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:04.386 W2-MON-1 argent argent@trsprd01:/home/argent > cat > /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:04.605 W2-MON-1 argent argent@trsprd01:/home/argent > cat > /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:05.714 W2-MON-1 argent chmod: /tmp/foo_SCP_ORACLE_FILESYSTEM_U1_REL_ORACLE_FILESYSTEM_TRSPRD01_W2_MON_1_20080206_160903: A file or directory in the path name does not exist.

06 Feb 2008 16:09:08.933 W2-MON-1 argent argent@trsprd01:/home/argent > sh -c /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:09.714 W2-MON-1 argent argent@trsprd01:/home/argent > sh -c /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:10.714 W2-MON-1 argent sh: /tmp/foo_SCP_ORACLE_FILESYSTEM_U1_REL_ORACLE_FILESYSTEM_TRSPRD01_W2_MON_1_20080206_160903: not found.

06 Feb 2008 16:09:25.058 W2-MON-1 argent argent@trsprd01:/home/argent > rm -f /tmp/foo _SCP _ORACLE _FILESYSTEM _U1

06 Feb 2008 16:09:28.058 W2-MON-1 argent argent@trsprd01:/home/argent > sh -c /tmp/foo

06 Feb 2008 16:09:28.090 W2-MON-1 argent SKIPPED. Reason: Rule ‘SCP_ORACLE_FILESYSTEM_U1’ is broken for the server trsprd01.

Resolution

An issue report has been logged and this will be resolved in a future release.

A workaround is to ensure that concurrent prerequisite rule executions are not scheduled.

A way of accomplishing this is to configure a Relator for each server (rather than for each Monitoring Group)

and choose a distinct schedule for each of those Relators.