Ureader.com  
Microsoft software help and Community
   home   |   control panel login   |   archive   |  
 
Exchange
2000.active.directory
2000.admin
2000.announcements
2000.app.conversion
2000.applications
2000.clients
2000.clustering
2000.connectivity
2000.development
2000.documentation
2000.general
2000.information.store
2000.interop
2000.kms
2000.misc
2000.protocols
2000.realtime.collabo.
2000.setup
2000.transport
2000.win2000
admin
application.conversion
applications
clients
clustering
connectivity
design
development
misc
mobility
setup
tools
  
 
date: Tue, 29 Jan 2008 07:24:42 -0500,    group: microsoft.public.exchange.clustering        back       


when did the cluster fail over?   
We have an Exchange 2003 SP2 2 node cluster.
The Exchange resource group (containing all services + disk resources) 
failed over to its peer node - we never noticed it.

I'm looking through the cluster log file in C:\windows\cluster for when & 
why it failed.

Is there a smoking gun in this logfile, a specific entry which indicates 
when or why it failed over.

Thanks!
Mike Bonvie
date: Tue, 29 Jan 2008 07:24:42 -0500   author:   Mike Bonvie

Re: when did the cluster fail over?   
"Mike Bonvie"  wrote in message
news:uRhNuHnYIHA.484@TK2MSFTNGP06.phx.gbl...
> We have an Exchange 2003 SP2 2 node cluster.
> The Exchange resource group (containing all services + disk resources)
> failed over to its peer node - we never noticed it.

Time to start researching monitoring solutions !
MOM obviously should be on your list to evaluate.

>
> I'm looking through the cluster log file in C:\windows\cluster for when &
> why it failed.

the cluster log has a maximum size, and will overwrite after time. It could
be it is already overwritten the actual failover.

>
> Is there a smoking gun in this logfile, a specific entry which indicates
> when or why it failed over.

Look at the sytem event log, look for Event ID 1069 (generic Event for a
resource failing) or Event ID 1200 (attempting to bring group online),
obviously when you see an Event ID 1200 for nodeB, that means it probably
went offline on nodeA first
That might give you a clue on when this happened, look for events coming up
to that time for more clues on why it happened.

Rgds,
Edwin.
date: Tue, 29 Jan 2008 13:12:54 -0000   author:   Edwin vMierlo [MVP]

Re: when did the cluster fail over?   
FYI, you could also set up a generic script resource in your cluster to
notify you when it has failed over. See
http://support.microsoft.com/kb/260527 for an example how to do this.

I agree with Edwin, that the System Event logs should be where you look
first. Look for the 1069 errors and/or informational events from clussvc
1200, 1201, 1203, 1204, 1205...these are posted on 2003 servers when a group
is moved. The 1069 is the key error as this will tell you which resource is
failing.

Once you've identified which resource is failing, you can then scan the
cluster.log on that node and see if your log contains the error code
explaining why the resource failed. The cluster.log is based in GMT time and
not local system time, so make sure you adjust accordingly. The cluster.log
is also a circular log so its possible that the logs might have wrapped
depending on how long ago this failure occurred.

Regards,
John

Visit my blog: http://msmvps.com/blogs/jtoner


"Edwin vMierlo [MVP]"  wrote in
message news:ewxypinYIHA.2000@TK2MSFTNGP05.phx.gbl...
>
> "Mike Bonvie"  wrote in message
> news:uRhNuHnYIHA.484@TK2MSFTNGP06.phx.gbl...
> > We have an Exchange 2003 SP2 2 node cluster.
> > The Exchange resource group (containing all services + disk resources)
> > failed over to its peer node - we never noticed it.
>
> Time to start researching monitoring solutions !
> MOM obviously should be on your list to evaluate.
>
> >
> > I'm looking through the cluster log file in C:\windows\cluster for when
&
> > why it failed.
>
> the cluster log has a maximum size, and will overwrite after time. It
could
> be it is already overwritten the actual failover.
>
> >
> > Is there a smoking gun in this logfile, a specific entry which indicates
> > when or why it failed over.
>
> Look at the sytem event log, look for Event ID 1069 (generic Event for a
> resource failing) or Event ID 1200 (attempting to bring group online),
> obviously when you see an Event ID 1200 for nodeB, that means it probably
> went offline on nodeA first
> That might give you a clue on when this happened, look for events coming
up
> to that time for more clues on why it happened.
>
> Rgds,
> Edwin.
>
>
date: Tue, 29 Jan 2008 09:41:32 -0500   author:   John Toner [MVP]

Google
 
Web ureader.com


    COPYRIGHT 2007, YARDI TECHNOLOGY LIMITED, ALL RIGHT RESERVE  |   contact us