|
|
|
date: Tue, 29 Jan 2008 07:24:42 -0500,
group: microsoft.public.exchange.clustering
back
Re: when did the cluster fail over?
FYI, you could also set up a generic script resource in your cluster to
notify you when it has failed over. See
http://support.microsoft.com/kb/260527 for an example how to do this.
I agree with Edwin, that the System Event logs should be where you look
first. Look for the 1069 errors and/or informational events from clussvc
1200, 1201, 1203, 1204, 1205...these are posted on 2003 servers when a group
is moved. The 1069 is the key error as this will tell you which resource is
failing.
Once you've identified which resource is failing, you can then scan the
cluster.log on that node and see if your log contains the error code
explaining why the resource failed. The cluster.log is based in GMT time and
not local system time, so make sure you adjust accordingly. The cluster.log
is also a circular log so its possible that the logs might have wrapped
depending on how long ago this failure occurred.
Regards,
John
Visit my blog: http://msmvps.com/blogs/jtoner
"Edwin vMierlo [MVP]" wrote in
message news:ewxypinYIHA.2000@TK2MSFTNGP05.phx.gbl...
>
> "Mike Bonvie" wrote in message
> news:uRhNuHnYIHA.484@TK2MSFTNGP06.phx.gbl...
> > We have an Exchange 2003 SP2 2 node cluster.
> > The Exchange resource group (containing all services + disk resources)
> > failed over to its peer node - we never noticed it.
>
> Time to start researching monitoring solutions !
> MOM obviously should be on your list to evaluate.
>
> >
> > I'm looking through the cluster log file in C:\windows\cluster for when
&
> > why it failed.
>
> the cluster log has a maximum size, and will overwrite after time. It
could
> be it is already overwritten the actual failover.
>
> >
> > Is there a smoking gun in this logfile, a specific entry which indicates
> > when or why it failed over.
>
> Look at the sytem event log, look for Event ID 1069 (generic Event for a
> resource failing) or Event ID 1200 (attempting to bring group online),
> obviously when you see an Event ID 1200 for nodeB, that means it probably
> went offline on nodeA first
> That might give you a clue on when this happened, look for events coming
up
> to that time for more clues on why it happened.
>
> Rgds,
> Edwin.
>
>
date: Tue, 29 Jan 2008 09:41:32 -0500
author: John Toner [MVP]
|
|