How Does MOM Handle HeartbeatFailureAnalysis?
Goto the MOM Home Page
 
I have a quick question regarding how MOM handles HeartbeatFailureAnalysis. We had a development SQL server that was accidentally shutdown by a developer. It was shut down gracefully. We didn't receive any notification that this server was down. We didn't find out for a couple hours later. Is this common behaviour for MOM? Does it acknowledge the fact that this server was gracefully shutdown?

Contributed By: Brian Wren [MSFT]
The only way that MOM knows that a managed server is dead is by the server missing a heartbeat. The consolidator does not actively attempt connections to the server to determine its availability. As a result, an alert is not immediately issued when a server goes down.
The consolidator will periodically check for any servers that have missed a heartbeat. It will then give them a period to successfully heartbeat again before it does its analysis. If this period expires without the server initiating a successful heartbeat, then the script HeartbeatFailureAnalysis is executed to determine the cause. The server may be dead, the OnePoint service may have hung, etc. Based on that analysis, an appropriate alert is raised.

The frequency of checking for missed heartbeats and the wait period before the analysis script is executed can be configured in Global Settings. These are the top two values (defaults of 300 and 720) on the Heartbeat Checking tab found under Global Settings | Consolidator.

The Processing Rules that generate the alerts can be found in the Microsoft Operations Manager | Consolidator Processing Rule Group. The names start with "Agent Heartbeat Failed - ". If you want to have notifications sent from the generated alerts, modify the alert tabs on these rules to generate an alert of severity Error instead of the default of Warning.
 

© FAQShop.com 2003 - 2008

Goto the MOM Home Page

Email the Author