| I have a
quick question regarding how MOM handles HeartbeatFailureAnalysis. We had a
development SQL server that was accidentally shutdown by a developer. It was
shut down gracefully. We didn't receive any notification that this server
was down. We didn't find out for a couple hours later. Is this common
behaviour for MOM? Does it acknowledge the fact that this server was
gracefully shutdown?
Contributed
By: Brian Wren [MSFT]
The only way that MOM knows that a managed server is dead is by the server
missing a heartbeat. The consolidator does not actively attempt connections
to the server to determine its availability. As a result, an alert is not
immediately issued when a server goes down.
The consolidator will periodically check for any servers that have missed a
heartbeat. It will then give them a period to successfully heartbeat again
before it does its analysis. If this period expires without the server
initiating a successful heartbeat, then the script HeartbeatFailureAnalysis
is executed to determine the cause. The server may be dead, the OnePoint
service may have hung, etc. Based on that analysis, an appropriate alert is
raised.
The frequency of checking for missed heartbeats and the wait period before
the analysis script is executed can be configured in Global Settings. These
are the top two values (defaults of 300 and 720) on the Heartbeat Checking
tab found under Global Settings | Consolidator.
The Processing Rules that generate the alerts can be found in the Microsoft
Operations Manager | Consolidator Processing Rule Group. The names start
with "Agent Heartbeat Failed - ". If you want to have notifications sent
from the generated alerts, modify the alert tabs on these rules to generate
an alert of severity Error instead of the default of Warning.
|