Can MOM
recognize a cluster for monitoring purposes? In other words, rather than
specifying the individual nodes in the Managed Computer Rules, can I specify
the cluster? So far, I have not been able to get MOM to recognize one, which
is problematic because monitoring the individual nodes is not the same as
monitoring the cluster.
I have several SQL clusters, amongst others. The real problem that I'm
dealing with is that they have a shared drive. If I install the Agent on
both nodes, I get false alerts that the shared drive is full from the
passive node because it can't access it.
Any thoughts?
Contributed
By: Brian Wren [MSFT]
Technically, MOM is not cluster aware. It absolutely can install the agent
on each individual node of the cluster and will provide management of the
physical server. The Cluster Management Pack will be applied to each node
which will give you alerts for any events generated by the cluster.
The bigger question is whether the management pack for the software you are
running on the cluster is cluster aware. I'm assuming that you are running
SQL (which seems to be the most common clustered app). The SQL management
pack is not currently cluster aware. Most of the pack works fine on a
cluster though. The only real issue is a false positive for the SQL service
not running on a passive node. The typical fix to this is to simply disable
the rule for checking the SQL service and let the cluster warn you if the
service fails. An alert is fired any time a cluster resource fails, which
is the functional equivalent to being notified if the SQL service fails.
This is a
common issue. The performance counter returns 0 free space when the node
doesn't own the disk. I wrote
this script to solve this.
This uses WMI to check the disk space and only issues an alert if the total
space of the disk is greater than 0. If the total space is 0, then it's an
obvious indication that the node doesn't own the disk.
To implement this script, do the following.
-
Create a new script
in MOM and past in the text from the script.
-
Specify two
parameters for the script - WarningLevel and ErrorLevel.
These represent the percentage disk space at which to create a warning or
an error alert. To match the existing functionality, set the default
value for WarningLevel to 10 and the ErrorLevel to 5.
-
Disable the two
threshold rules starting with "% Free Space Any Logical Disk" in the
processing rule group Microsoft Windows 2000 Operating System\Windows 2000
- All Computers\Threshold Performance Counters for Windows 2000\.
-
In this same
processing rule group, create a timed rule to call the new script every 30
minutes. Make sure you set the parameters to the warning and error
levels.
-
In this same
processing rule group, create two new event rules as follows:
Provider
Name: Script-generated Data
Criteria - from source: <Name of your new script>
Criteria - with event id: 1001
Generate Alert - alert severity: Warning
Provider Name: Script-generated Data
Criteria - from source: <Name of your new script>
Criteria - with event id: 1002
Generate Alert - alert severity: Error
I think those
directions are correct. I just typed them up, so please let me know if I've
made a mistake or if you need more details.
|