Ureader.com  
Microsoft software help and Community
   home   |   control panel login   |   archive   |  
 
misc
exam.security
microsoft_update_catalog
msdn.annotations
msdn.drgui.discussion
msdn.duwamish
msdn.general
msdn.magazine
msdn.soaptoolkit
msdn.webservices
msdntraining
opsmgr.connectors
opsmgr.sp1
technet
technet.howtofeedback
technet.howtoneeds
technet.magazine
technet.technettalks
  
 
date: Thu, 31 Jul 2008 00:21:41 +0200,    group: microsoft.public.opsmgr.sp1        back       


Alerts and stage change events storm - MSFT help needed   
Hi folks,

I'm having a nasty issue with monitors. This is the behavior I can observe:

-          Every time an agent is "reset" all the monitors reset their state as well, this means they record a state change event and if the monitor is set to autoclose alerts a close alert as well. This is done regardless of the previous monitor state.

-          This is especially evident on very crowded SQL Servers with tens of databases

-          This is especially evident on MSCS clusters

The agent "reset" is triggered by:

-          Any healthservice restart

-          Any new MP downloaded to the agent so even overrides trigger a "reset" in terms of monitors

-          Any resume from maintenance mode

 

This has the following drawbacks:

-          During tuning a huge amount of state change events and "closed alerts" are recorded. For a huge amount I mean about 1 million rows per day for state change events and tens of thousands alerts per day for an environment with 300 agents deployed and about 2000 SQL databases. I'm citing databases because the database monitors are the top alert generating monitors. Obviously these number are referred to a period of peak tuning and overrides creation.

-          If a monitor generated alert has been changed by the operators (i.e. filling a ticket id property, change the resolution state to signal an escalation and so on) the changes are lost after the "reset". The alert is closed and then reopened by the reset.

-          During these alert and state storms my biproc dual threaded SQL server box uses 100% CPU (on all the 4 threads) and stays up there for several minutes. In this timeframe the operators consoles are unusable. The SQL box capacity is set accordingly to the perf & scalability guide.

 

Now, is this by design or is this a specific issue of my environment? In the second case I'll open a PSS incident, but in the former this is *very bad design*! If so fix it ASAP.

 

Regards

Daniele
date: Thu, 31 Jul 2008 00:21:41 +0200   author:   Daniele Grandini

Google
 
Web ureader.com


    COPYRIGHT 2007, YARDI TECHNOLOGY LIMITED, ALL RIGHT RESERVE  |   contact us