I have a two node cluster (active/active). When I try the failover with the cluster administrator tool I have seen that it occurs 15-20 seconds to recover the SQL Server resource. Is it possible to decrement the failover time for the SQL Server resource? How? Thanks
15 to 20 seconds is quite good for a cluster failover. I don't suspect you will be able to do much more to it than that. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "Pasquale" wrote in message news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... >I have a two node cluster (active/active). > When I try the failover with the cluster administrator tool I have seen > that > it occurs 15-20 seconds to recover the SQL Server resource. > Is it possible to decrement the failover time for the SQL Server resource? > How? > Thanks
Keep in mind what has to happen when a cluster fails over or is moved to the other node. The cluster has to recognize the failure or the move request. The cluster then has to send SQL and the other resources shutdown signals and then wait for all the resources to respond. If the resources don't respond the cluster has to wait for the request to time-out before killing the resource. Once all the resources are off-line the cluster then has to send start signals to the resources on the other node and again wait for a response. The resources most likely have to start up in a particular order, so the start signals have to wait for each resource in the particular order to start and respond before the next resource can be sent a start signal. All of this signaling takes time, 15 to 20 seconds is actually pretty good response. I suspect you were testing the fail-over and this 15 to 20 seconds isn't based upon an actual failure where timeouts will most likely be encountered and a much slower response as a result. This is what clusters do, they don't guarentee that you won't have a service interruption, just that the service interruption will be shorter then if you had to manually respond. Highly reliable and highly available are not the same. "Pasquale" wrote in message news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... >I have a two node cluster (active/active). > When I try the failover with the cluster administrator tool I have seen > that > it occurs 15-20 seconds to recover the SQL Server resource. > Is it possible to decrement the failover time for the SQL Server resource? > How? > Thanks
I have tried the failover by moving a resource group from one node to other one. The time registered refers to SQL Server resource recovering and not to entire group recovering. However, does exist a method to reduce the failover time (during a real event or not) to avoid loss transactions? Thanks "Tim Walsh" wrote: > Keep in mind what has to happen when a cluster fails over or is moved to the > other node. The cluster has to recognize the failure or the move request. > The cluster then has to send SQL and the other resources shutdown signals > and then wait for all the resources to respond. If the resources don't > respond the cluster has to wait for the request to time-out before killing > the resource. Once all the resources are off-line the cluster then has to > send start signals to the resources on the other node and again wait for a > response. The resources most likely have to start up in a particular order, > so the start signals have to wait for each resource in the particular order > to start and respond before the next resource can be sent a start signal. > All of this signaling takes time, 15 to 20 seconds is actually pretty good > response. I suspect you were testing the fail-over and this 15 to 20 seconds > isn't based upon an actual failure where timeouts will most likely be > encountered and a much slower response as a result. > > This is what clusters do, they don't guarentee that you won't have a service > interruption, just that the service interruption will be shorter then if you > had to manually respond. Highly reliable and highly available are not the > same. > > > > > > "Pasquale" wrote in message > news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... > >I have a two node cluster (active/active). > > When I try the failover with the cluster administrator tool I have seen > > that > > it occurs 15-20 seconds to recover the SQL Server resource. > > Is it possible to decrement the failover time for the SQL Server resource? > > How? > > Thanks > > >
Reducing the failover time would result in lost transactions. A failover event is much like a restart of the SQL Service. SQL Server must recover each user database by rolling forward committed transactions and rolling back uncommitted ones. There are some multi-tier architecture techniques that can isolate the front end web service databases from the actual back-end transactional ones, but those require significant application changes.to implement. -- Geoff N. Hiten Principal SQL Infrastructure Consultant Microsoft SQL Server MVP "Pasquale" wrote in message news:42C8503C-7B6B-4FD6-843A-91F9F6648A94@microsoft.com... >I have tried the failover by moving a resource group from one node to other > one. > The time registered refers to SQL Server resource recovering and not to > entire group recovering. > However, does exist a method to reduce the failover time (during a real > event or not) to avoid loss transactions? Thanks > > "Tim Walsh" wrote: > >> Keep in mind what has to happen when a cluster fails over or is moved to >> the >> other node. The cluster has to recognize the failure or the move request. >> The cluster then has to send SQL and the other resources shutdown signals >> and then wait for all the resources to respond. If the resources don't >> respond the cluster has to wait for the request to time-out before >> killing >> the resource. Once all the resources are off-line the cluster then has to >> send start signals to the resources on the other node and again wait for >> a >> response. The resources most likely have to start up in a particular >> order, >> so the start signals have to wait for each resource in the particular >> order >> to start and respond before the next resource can be sent a start signal. >> All of this signaling takes time, 15 to 20 seconds is actually pretty >> good >> response. I suspect you were testing the fail-over and this 15 to 20 >> seconds >> isn't based upon an actual failure where timeouts will most likely be >> encountered and a much slower response as a result. >> >> This is what clusters do, they don't guarentee that you won't have a >> service >> interruption, just that the service interruption will be shorter then if >> you >> had to manually respond. Highly reliable and highly available are not the >> same. >> >> >> >> >> >> "Pasquale" wrote in message >> news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... >> >I have a two node cluster (active/active). >> > When I try the failover with the cluster administrator tool I have seen >> > that >> > it occurs 15-20 seconds to recover the SQL Server resource. >> > Is it possible to decrement the failover time for the SQL Server >> > resource? >> > How? >> > Thanks >> >> >>
"Geoff N. Hiten" wrote in message news:OkSe5TQDJHA.1628@TK2MSFTNGP02.phx.gbl... > Reducing the failover time would result in lost transactions. A failover > event is much like a restart of the SQL Service. SQL Server must recover > each user database by rolling forward committed transactions and rolling > back uncommitted ones. There are some multi-tier architecture techniques > that can isolate the front end web service databases from the actual > back-end transactional ones, but those require significant application > changes.to implement. > absolutely, And, although mentioned before, I believe a 15-20 second is not all that bad.
Does exist a direct relation between failover time and transaction loss? Is it not possible to reduce the failover time and to save the transactions? Thanks "Geoff N. Hiten" wrote: > Reducing the failover time would result in lost transactions. A failover > event is much like a restart of the SQL Service. SQL Server must recover > each user database by rolling forward committed transactions and rolling > back uncommitted ones. There are some multi-tier architecture techniques > that can isolate the front end web service databases from the actual > back-end transactional ones, but those require significant application > changes.to implement. > > -- > Geoff N. Hiten > Principal SQL Infrastructure Consultant > Microsoft SQL Server MVP > > > > > "Pasquale" wrote in message > news:42C8503C-7B6B-4FD6-843A-91F9F6648A94@microsoft.com... > >I have tried the failover by moving a resource group from one node to other > > one. > > The time registered refers to SQL Server resource recovering and not to > > entire group recovering. > > However, does exist a method to reduce the failover time (during a real > > event or not) to avoid loss transactions? Thanks > > > > "Tim Walsh" wrote: > > > >> Keep in mind what has to happen when a cluster fails over or is moved to > >> the > >> other node. The cluster has to recognize the failure or the move request. > >> The cluster then has to send SQL and the other resources shutdown signals > >> and then wait for all the resources to respond. If the resources don't > >> respond the cluster has to wait for the request to time-out before > >> killing > >> the resource. Once all the resources are off-line the cluster then has to > >> send start signals to the resources on the other node and again wait for > >> a > >> response. The resources most likely have to start up in a particular > >> order, > >> so the start signals have to wait for each resource in the particular > >> order > >> to start and respond before the next resource can be sent a start signal. > >> All of this signaling takes time, 15 to 20 seconds is actually pretty > >> good > >> response. I suspect you were testing the fail-over and this 15 to 20 > >> seconds > >> isn't based upon an actual failure where timeouts will most likely be > >> encountered and a much slower response as a result. > >> > >> This is what clusters do, they don't guarentee that you won't have a > >> service > >> interruption, just that the service interruption will be shorter then if > >> you > >> had to manually respond. Highly reliable and highly available are not the > >> same. > >> > >> > >> > >> > >> > >> "Pasquale" wrote in message > >> news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... > >> >I have a two node cluster (active/active). > >> > When I try the failover with the cluster administrator tool I have seen > >> > that > >> > it occurs 15-20 seconds to recover the SQL Server resource. > >> > Is it possible to decrement the failover time for the SQL Server > >> > resource? > >> > How? > >> > Thanks > >> > >> > >> > >
You cannot "save" a transaction that is not committed. By definition, it is incomplete and should be rolled back. Any completed transactions are rolled forward and are not lost. SQL 2005 and later make the database available after the roll-forward step as a means of reducing failover and startup time. -- Geoff N. Hiten Principal SQL Infrastructure Consultant Microsoft SQL Server MVP "Pasquale" wrote in message news:DACE2259-9C29-4C64-A075-181CA47226C4@microsoft.com... > Does exist a direct relation between failover time and transaction loss? > Is it not possible to reduce the failover time and to save the > transactions? > > Thanks > > "Geoff N. Hiten" wrote: > >> Reducing the failover time would result in lost transactions. A failover >> event is much like a restart of the SQL Service. SQL Server must recover >> each user database by rolling forward committed transactions and rolling >> back uncommitted ones. There are some multi-tier architecture techniques >> that can isolate the front end web service databases from the actual >> back-end transactional ones, but those require significant application >> changes.to implement. >> >> -- >> Geoff N. Hiten >> Principal SQL Infrastructure Consultant >> Microsoft SQL Server MVP >> >> >> >> >> "Pasquale" wrote in message >> news:42C8503C-7B6B-4FD6-843A-91F9F6648A94@microsoft.com... >> >I have tried the failover by moving a resource group from one node to >> >other >> > one. >> > The time registered refers to SQL Server resource recovering and not to >> > entire group recovering. >> > However, does exist a method to reduce the failover time (during a real >> > event or not) to avoid loss transactions? Thanks >> > >> > "Tim Walsh" wrote: >> > >> >> Keep in mind what has to happen when a cluster fails over or is moved >> >> to >> >> the >> >> other node. The cluster has to recognize the failure or the move >> >> request. >> >> The cluster then has to send SQL and the other resources shutdown >> >> signals >> >> and then wait for all the resources to respond. If the resources don't >> >> respond the cluster has to wait for the request to time-out before >> >> killing >> >> the resource. Once all the resources are off-line the cluster then has >> >> to >> >> send start signals to the resources on the other node and again wait >> >> for >> >> a >> >> response. The resources most likely have to start up in a particular >> >> order, >> >> so the start signals have to wait for each resource in the particular >> >> order >> >> to start and respond before the next resource can be sent a start >> >> signal. >> >> All of this signaling takes time, 15 to 20 seconds is actually pretty >> >> good >> >> response. I suspect you were testing the fail-over and this 15 to 20 >> >> seconds >> >> isn't based upon an actual failure where timeouts will most likely be >> >> encountered and a much slower response as a result. >> >> >> >> This is what clusters do, they don't guarentee that you won't have a >> >> service >> >> interruption, just that the service interruption will be shorter then >> >> if >> >> you >> >> had to manually respond. Highly reliable and highly available are not >> >> the >> >> same. >> >> >> >> >> >> >> >> >> >> >> >> "Pasquale" wrote in message >> >> news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... >> >> >I have a two node cluster (active/active). >> >> > When I try the failover with the cluster administrator tool I have >> >> > seen >> >> > that >> >> > it occurs 15-20 seconds to recover the SQL Server resource. >> >> > Is it possible to decrement the failover time for the SQL Server >> >> > resource? >> >> > How? >> >> > Thanks >> >> >> >> >> >> >> >>
I explain better my reply. You said that "Reducing the failover time would result in lost transactions ...". Why? Why rendering faster the SQL recover would result in lost transactions? Thanks "Geoff N. Hiten" wrote: > You cannot "save" a transaction that is not committed. By definition, it is > incomplete and should be rolled back. Any completed transactions are rolled > forward and are not lost. SQL 2005 and later make the database available > after the roll-forward step as a means of reducing failover and startup > time. > > > -- > Geoff N. Hiten > Principal SQL Infrastructure Consultant > Microsoft SQL Server MVP > > > > > "Pasquale" wrote in message > news:DACE2259-9C29-4C64-A075-181CA47226C4@microsoft.com... > > Does exist a direct relation between failover time and transaction loss? > > Is it not possible to reduce the failover time and to save the > > transactions? > > > > Thanks > > > > "Geoff N. Hiten" wrote: > > > >> Reducing the failover time would result in lost transactions. A failover > >> event is much like a restart of the SQL Service. SQL Server must recover > >> each user database by rolling forward committed transactions and rolling > >> back uncommitted ones. There are some multi-tier architecture techniques > >> that can isolate the front end web service databases from the actual > >> back-end transactional ones, but those require significant application > >> changes.to implement. > >> > >> -- > >> Geoff N. Hiten > >> Principal SQL Infrastructure Consultant > >> Microsoft SQL Server MVP > >> > >> > >> > >> > >> "Pasquale" wrote in message > >> news:42C8503C-7B6B-4FD6-843A-91F9F6648A94@microsoft.com... > >> >I have tried the failover by moving a resource group from one node to > >> >other > >> > one. > >> > The time registered refers to SQL Server resource recovering and not to > >> > entire group recovering. > >> > However, does exist a method to reduce the failover time (during a real > >> > event or not) to avoid loss transactions? Thanks > >> > > >> > "Tim Walsh" wrote: > >> > > >> >> Keep in mind what has to happen when a cluster fails over or is moved > >> >> to > >> >> the > >> >> other node. The cluster has to recognize the failure or the move > >> >> request. > >> >> The cluster then has to send SQL and the other resources shutdown > >> >> signals > >> >> and then wait for all the resources to respond. If the resources don't > >> >> respond the cluster has to wait for the request to time-out before > >> >> killing > >> >> the resource. Once all the resources are off-line the cluster then has > >> >> to > >> >> send start signals to the resources on the other node and again wait > >> >> for > >> >> a > >> >> response. The resources most likely have to start up in a particular > >> >> order, > >> >> so the start signals have to wait for each resource in the particular > >> >> order > >> >> to start and respond before the next resource can be sent a start > >> >> signal. > >> >> All of this signaling takes time, 15 to 20 seconds is actually pretty > >> >> good > >> >> response. I suspect you were testing the fail-over and this 15 to 20 > >> >> seconds > >> >> isn't based upon an actual failure where timeouts will most likely be > >> >> encountered and a much slower response as a result. > >> >> > >> >> This is what clusters do, they don't guarentee that you won't have a > >> >> service > >> >> interruption, just that the service interruption will be shorter then > >> >> if > >> >> you > >> >> had to manually respond. Highly reliable and highly available are not > >> >> the > >> >> same. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> "Pasquale" wrote in message > >> >> news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... > >> >> >I have a two node cluster (active/active). > >> >> > When I try the failover with the cluster administrator tool I have > >> >> > seen > >> >> > that > >> >> > it occurs 15-20 seconds to recover the SQL Server resource. > >> >> > Is it possible to decrement the failover time for the SQL Server > >> >> > resource? > >> >> > How? > >> >> > Thanks > >> >> > >> >> > >> >> > >> > >> > >
He means that regardless of the recovery time you will always loose any transactions that are open at the time the active node fails. The time to recover will only affect the time in which you cannot do any transactions but only the ones OPEN at the time it fails will be lost. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "Pasquale" wrote in message news:F772F989-7DC2-498E-AC4D-C6024AB3F0A4@microsoft.com... >I explain better my reply. > You said that "Reducing the failover time would result in lost > transactions > ...". > Why? Why rendering faster the SQL recover would result in lost > transactions? > Thanks > > "Geoff N. Hiten" wrote: > >> You cannot "save" a transaction that is not committed. By definition, it >> is >> incomplete and should be rolled back. Any completed transactions are >> rolled >> forward and are not lost. SQL 2005 and later make the database available >> after the roll-forward step as a means of reducing failover and startup >> time. >> >> >> -- >> Geoff N. Hiten >> Principal SQL Infrastructure Consultant >> Microsoft SQL Server MVP >> >> >> >> >> "Pasquale" wrote in message >> news:DACE2259-9C29-4C64-A075-181CA47226C4@microsoft.com... >> > Does exist a direct relation between failover time and transaction >> > loss? >> > Is it not possible to reduce the failover time and to save the >> > transactions? >> > >> > Thanks >> > >> > "Geoff N. Hiten" wrote: >> > >> >> Reducing the failover time would result in lost transactions. A >> >> failover >> >> event is much like a restart of the SQL Service. SQL Server must >> >> recover >> >> each user database by rolling forward committed transactions and >> >> rolling >> >> back uncommitted ones. There are some multi-tier architecture >> >> techniques >> >> that can isolate the front end web service databases from the actual >> >> back-end transactional ones, but those require significant application >> >> changes.to implement. >> >> >> >> -- >> >> Geoff N. Hiten >> >> Principal SQL Infrastructure Consultant >> >> Microsoft SQL Server MVP >> >> >> >> >> >> >> >> >> >> "Pasquale" wrote in message >> >> news:42C8503C-7B6B-4FD6-843A-91F9F6648A94@microsoft.com... >> >> >I have tried the failover by moving a resource group from one node to >> >> >other >> >> > one. >> >> > The time registered refers to SQL Server resource recovering and not >> >> > to >> >> > entire group recovering. >> >> > However, does exist a method to reduce the failover time (during a >> >> > real >> >> > event or not) to avoid loss transactions? Thanks >> >> > >> >> > "Tim Walsh" wrote: >> >> > >> >> >> Keep in mind what has to happen when a cluster fails over or is >> >> >> moved >> >> >> to >> >> >> the >> >> >> other node. The cluster has to recognize the failure or the move >> >> >> request. >> >> >> The cluster then has to send SQL and the other resources shutdown >> >> >> signals >> >> >> and then wait for all the resources to respond. If the resources >> >> >> don't >> >> >> respond the cluster has to wait for the request to time-out before >> >> >> killing >> >> >> the resource. Once all the resources are off-line the cluster then >> >> >> has >> >> >> to >> >> >> send start signals to the resources on the other node and again >> >> >> wait >> >> >> for >> >> >> a >> >> >> response. The resources most likely have to start up in a >> >> >> particular >> >> >> order, >> >> >> so the start signals have to wait for each resource in the >> >> >> particular >> >> >> order >> >> >> to start and respond before the next resource can be sent a start >> >> >> signal. >> >> >> All of this signaling takes time, 15 to 20 seconds is actually >> >> >> pretty >> >> >> good >> >> >> response. I suspect you were testing the fail-over and this 15 to >> >> >> 20 >> >> >> seconds >> >> >> isn't based upon an actual failure where timeouts will most likely >> >> >> be >> >> >> encountered and a much slower response as a result. >> >> >> >> >> >> This is what clusters do, they don't guarentee that you won't have >> >> >> a >> >> >> service >> >> >> interruption, just that the service interruption will be shorter >> >> >> then >> >> >> if >> >> >> you >> >> >> had to manually respond. Highly reliable and highly available are >> >> >> not >> >> >> the >> >> >> same. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> "Pasquale" wrote in message >> >> >> news:9C239FA3-ACC2-476B-AA1F-2EA208D056F9@microsoft.com... >> >> >> >I have a two node cluster (active/active). >> >> >> > When I try the failover with the cluster administrator tool I >> >> >> > have >> >> >> > seen >> >> >> > that >> >> >> > it occurs 15-20 seconds to recover the SQL Server resource. >> >> >> > Is it possible to decrement the failover time for the SQL Server >> >> >> > resource? >> >> >> > How? >> >> >> > Thanks >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>