Blog

WMI Event sequences for database mirroring.

In my last post, I covered how WMI can help us monitor mirroring in SQL Server 2005. Before I post the code, let me explain a few more things.
 
Originally, the code was supposed to be really simple:
 
1 - Connect to both principal and mirror servers.
2 - Listen in for relevant events.
3 - Fire events when Automatic Failover and Manual Failover occurs.
 
While, conceptually this is all we need to do, the WMI events generated for mirroring events are not that simple.
 
In the following event sequences, there are two servers:
 
Server A – Originally configured as the mirror
Server B – Originally configured as the principal
 
These discussions assume the presence of a witness server… we’ll explain the differences in the events without a witness in another post.
 
Note that all these events are generated per instance and per database. A database is the “unit” in which mirroring operates. You already knew that, but this makes things complicated since you have to monitor each database. This is why the code that I’ll present later is very encapsulated.
 
Automatic Failover
The following events are triggered when an Automatic Failover occurs. Note that Server B generates no events… it should be logical, since the “database” went offline.
 
Also, note how the events always reflect the role of the server generating the event. In this case, Server A, once it loses its connection to the Principal (Server B), becomes the principal after the “Automatic Failover” event. Therefore, it generates an event saying, “Hey, I am running exposed without a mirror – I am the principal”…
 

<<Server A>> – PrincipalConnectionLost
<<Server A>> - AutomaticFailover
<<Server A>> - PrincipalExposed

Manual Failover
A manual failover is different in the sense that the mirror and principal are both active. The manual failover is in essence a role reversal.
 
This makes the series of events a bit more complicated. Here, the principal server generates the ManualFailover event – it is after all, advertising it needs to manually fail over. The mirror then becomes the principal (starts the synchronization process, taking the log remaining to be applied)… the mirror does the same.
 
The key event here is how the principal then advertises it has synchronized and connected to the Witness. After that, the Mirror also advertises it has synchronized with the Witness.
 

<<Server B>> – ManualFailover
<<Server A>> – SynchronizingPincipal
<<Server B>> – SynchronizingMirror
<<Server A>> – SynchronizedPrincipalWitness
<<Server B>>- SynchronizedMirrrorWitness

How to react to these events?
For the application I am currently working with, it is very important that we avoid unnecessary reactions to a fail over event. Sometimes, a single communication failure between the witness and any of the servers triggers a failover; then another failover once the communication is re-established or lost again!.

The process that switches over our database connections takes between 2 and 5 minutes depending on how many databases failed over. So if we have 3 or 4 failovers in rapid succession, we only really care about the last one, and ONLY after a certain period of time has passed in which we can say the situation is stable. Of course if the events happen every 5 minutes, we’ll be switching over constantly.

All that to say that we need some kind of wait timer after we receive the last synchronization event (i.e. the PrincipalExposed and SynchronizedPrincipalWitness messages) and the actual notification for our user program. That part complicates things further.

Additional considerations
But wait, there’s more. There are more events that our monitor will receive. In particular, during our testing, we received, sometimes, during the failover process, a series of “NoQuorum” messages; for now, we are ignoring those.

In situations where there is no witness, the events don’t change significantly, we’ll cover that later.

cib9t3zxrs

 

Comments

Leave a comment

 
 
 
 
CAPTCHA Image Validation