| I have MOM
configured to notify (email, pager) me when IIS is not responding. But if it
comes back on it's own, MOM does not send an email saying it is back up. Is
there a way to configure this?
Contributed
By: Brian Wren [MSFT]
If you are seeing that IIS is down by using HTTPPing, then you certainly
don't want to be paged every time it is successful. You probably only want
to be notified if the ping is successful when the previous ping was
unsuccessful. That means that IIS was down but is now back up. You will need
to do some script modification for this. If the ping is unsuccessful, you
would update some state variable.
Each time HTTPing is successful, then it checks this state variable to see
if the previous ping failed. If that is the case, it clears the variable and
issues an alert.
Actually, no matter how you are detecting the status of IIS, the basic
concept is going to be the same. MOM (like all other management software)
works by detecting events generated by a resource or proactively collecting
the status of a resource. Most resources don't generate an event when they
start successfully. Event if they do, we don't want an storm of happy alerts
every time a server and all of its resources. We also don't want to generate
an alert every time we see that a resource is healthy. The answer is to
record that a resource is unhealthy so that the next time we see that it is
healthy, we know to generate an alert based on its change of status.
The typical method for doing this is state variables.
This is a common request, and it's a lot more difficult than it may appear.
I'm not a huge fan of these type of alerts. The fact that we had an error
means that someone should be looking at the server. They will see that the
resource is healthy again and should be then diagnosing what cause the
problem. A notification that the problem is gone shouldn't keep someone from
doing some diagnosis. The one place where there is definitely an argument
for this is after hours support. A notification at 2:00 AM reporting that a
resource is healthy after it's 1:30 failure means the after hours operator
can go back to sleep and worry about the issue in the morning.
|