Server 2012 AlwaysOn - high availability solutions overview

EMC MULTISITE DISASTER RECOVERY FOR MICROSOFT SQL SERVER 2012

EMC VNX5700, EMC FAST Cache, SQL Server AlwaysOn Availability Groups

  • Mission-critical high availability
  • Automated storage optimization
Solution homepage: https://community.emc.com/docs/DOC-14614
SQL Server high availability solutions overview

SQL Server offers administrators several options to configure high availability for both servers and databases. These high availability configurations have until now included:

 

  • Database mirroringmaintains a single standby database, or mirror database, of a production database that is referred to as a principal database. This solution increases database availability by supporting almost instantaneous failover and is primarily a software solution.
  • Log shippingmaintains one or more warm standby databases, or secondary databases, for a single production database that is referred to as the primary database.
  • Replicationa primary server, referred to as the Publisher, distributes data to one or more secondary servers, or Subscribers. Replication enables real-time availability and scalability across these servers. Replication also supports filtering to provide only a subset of data to subscribers, and also allows for partitioned updates. Subscribers are online and available for reporting and other functions without query recovery. Three types of replication are available: snapshot, transactional, and merge, with transactional replication providing the lowest latency and the one most often used for high availability.

 

SQL Server 2012 introduces two high availability configurations as part of SQL Server AlwaysOn, which provides availability at either the application database or instance level:

  • AlwaysOn Failover Clustering
  • AlwaysOn Availability Groups
AlwaysOn Failover Clustering

A single SQL Server instance is installed across multiple Windows Server Failover Cluster (WSFC) nodes. WSFC functionality provides high-availability at the instance level, by presenting a Failover Cluster Instance (FCI) to the network as a single computer accessible through the cluster’s virtual name.

 

This configuration is an enhancement to the SQL Server FCI functionality available in previous versions of SQL Server. In SQL Server 2008 R2 and previous versions, SQL Server iterated through all IP addresses in the failover cluster resource group and attempted to bind to all during startup, with failed bindings causing SQL Server startup fails. Therefore, in SQL Server 2008 R2 and previous versions, stretch VLANs were used to enable SQL Server multisite failover clustering.

 

Many customers are reluctant to deploy stretch VLANs due to considerations of security, cost, complexity, or incompatibility with a corporate standard. This has been a key constraint in limiting the deployment of SQL Server multisite and multi-subnet clustering.

 

With SQL Server 2012, improvements have been made to the multisite and specifically multi-subnet failover clustering implementation. Two major enhancements made to support multi-subnet clustering are:

  • Cluster Setup supportBoth AddNode, for integrated installation, and CompleteFailoverCluster, for advanced installation, can intelligently detect a multi-subnet environment, and automatically set the IP address resource dependency to OR.
  • SQL Server Engine supportTo bring the SQL Server resource online, the SQL Server Engine startup logic skips binding to any IP address that is not in an online state, as shown in Figure 1.

ClusterPropertiesDependencies.png

Figure 1. Cluster Properties—Dependencies

 

AlwaysOn Availability Groups

AlwaysOn Availability Groups support a failover environment for a specific set of user databases, known as availability databases, that fail over together. Like AlwaysOn Failover Clustering, AlwaysOn Availability Groups require the SQL Server instances to be configured on WSFC nodes, but with the instances remaining and being presented to the network as separate computers. Availability groups support a set of primary databases and one to four sets of corresponding secondary databases. An availability group fails over at the level of an availability replica and, optionally, secondary databases can be made available for read-only access and some backup operations.

Availability replicas and roles

Availability groups consist of a set of two or more failover partners referred to as availability replicas. Each availability replica is hosted on a separate instance of SQL Server which in turn resides on separate nodes of a WSFC cluster. Each of the SQL Server instances is either a SQL Server FCI, as shown in Figure 2, or a stand-alone instance with AlwaysOn Availability Groups enabled, as shown in Figure 3.

 

SQLServerFCI.png

Figure 2 Microsoft SQL Server failover cluster instance (FCI)

 

SQLServerAG.png

Figure 3 Microsoft SQL Server AlwaysOn Availability Group


Each availability replica hosts a copy of the availability databases in the availability group. Each availability replica is assigned an initial role as either the primary role or the secondary role:

  • Primary replicaholds the primary role and there can be only one. A primary replica hosts the read/write databases which are known as the primary databases.
  • Secondary replicaof which there can be up to four, each holding the secondary role, with secondary replicas each hosting read-only databases.
Secondary replicas

Each secondary replica has the ability to be configured so that it can transition to become the primary replica during failover.

After the initial availability group has been created, any additional database to be added to the availability group must be an online, read/write database on the server that hosts the primary replica. Adding a new database to an availability group keeps the database online to clients, but no secondary copy exists until a backup of the data and transaction log has been restored to the secondary replica host (using RESTORE WITH NORECOVERY, which omits the undo phase of the restore process, preserving uncommitted transactions. This allows for restoring additional backups to roll the database further forward in time).

Any new secondary database is marked as being in the RESTORING state until it has finished joining the availability group. Once the secondary database has joined the availability group its state is changed to ONLINE. The availability group then initiates data synchronization between the secondary database and its corresponding primary database.

During synchronization, the primary replica sends the transaction log records from the primary database to the secondary replica which in turn writes the transaction log records to disk, referred to as hardening the log, and applies the log records to the secondary database.

During times such as a failover, the role of the availability replica may become indeterminate. In this case, the databases within the availability group will be marked as being in a NOT SYNCHRONIZING state and their role will be set to RESOLVING until the role of the replica is resolved. If the availability replica is resolved to the primary role, the databases become the primary databases; if resolved to the secondary role, the databases become secondary databases.

Readable secondary replicas

Secondary replicas can be configured so that while in the secondary role they accept read-only client connection to local databases. These secondary databases are referred to as readable secondary replicas.

Note    These secondary databases are not set to read-only. Unlike a read-only database, which is static, a secondary replica, secondary database is dynamic, continuously changing as the corresponding primary database changes are applied to it.

Backup preferences

Secondary replicas also support performing copy-only backups of a full database, file, and filegroups, and log backups. During availability group creation you can specify a number of settings to control the selection of the replica used for backups.

Backup preference settings:

  • Prefer Secondary Automated backups for this availability group should occur on a secondary replica. If no secondary replica is available, backups are performed on the primary replica.
  • Secondary only All automated backups for this availability group must occur on a secondary replica.
  • Primary All automated backups for this availability group must occur on the current primary replica.
  • Any Replica Backups can occur on any replica in the availability group.

 

With these preferences you can also set the backup priority (lowest=1, highest=100) and also the ability to exclude particular replicas.

 

Availability modes

Each availability group has an availability mode setting. This determines whether the primary replica has to wait for a transaction to be committed on a database until the corresponding secondary replica has written the transaction log to disk (hardening the log).

AlwaysOn Availability Groups support two availability modes:

  • Asynchronous-commit mode In asynchronous-commit mode, the primary replica commits a transaction without acknowledgment that an asynchronous-commit replica has hardened the log. Asynchronous-commit mode minimizes transaction latency, allowing the secondary databases to lag behind the primary; this make data loss possible.
  • Synchronous-commit mode In synchronous-commit mode, the primary replica waits for acknowledgment that a synchronous-commit secondary has hardened the log before committing a transaction. Synchronous-commit mode increases transaction latency but protects against data loss; this means that as long as the secondary databases are in a synchronized state with the primary database, committed transactions are fully protected.

 

Failover modes

With SQL Server 2012 Availability Groups, failover involves a target secondary replica transitioning to the primary replica and taking over the primary role. Three forms of failover exist. Two of these—automatic failover and planned manual failover—require the replicas to be in synchronous-commit mode but have the potential for no data loss. The third option—forced failover—risks data loss, but is the only option available with asynchronous-commit mode.

  • Automatic failover (without data loss) Automatic failover requires that the primary replica and target secondary replica are running in synchronous-commit mode and the secondary replica is in a synchronized state. Also have WSFC quorum configured and meet the conditions specified by the availability groups flexible failover policy.

Failover occurs in response to a failure of the primary replica. A secondary replica is transitioned to the primary role, and becomes the primary replica. After the original primary replica is recovered and becomes available it is transitioned to the secondary role. It will not be available to become the primary replica until it is once again in a synchronized state.

 

Note    SQL Server Failover Cluster Instances (FCI) do not support AlwaysOn automatic failover; an availability replica hosted by an FCI can only be configured for manual failover. If automatic failover is required the SQL Server instances should be configured as part of a WSFC.

 

  • Planned manual failover (without data loss) Manual failover requires that the primary replica and target secondary replica are running in synchronous-commit mode and the secondary replica is in a synchronized state.

 

Failover occurs after a manual failover command is issued causing the target secondary replica to be transitioned to the primary role, and become the primary replica. The original primary replica now becoming the secondary replica.

  • Forced manual failover (possible data loss) Forced manual failover is the only option available when the primary replica and secondary replica are running in asynchronous-commit mode. Failover occurs after a manual failover command is issued causing the target secondary replica to be transitioned to the primary role, and become the primary replica.

 

Note   If primary and secondary replicas are running in synchronous-commit mode and the secondary replica is not in a synchronized state, then forced manual failover is the only option available. As forced manual failover requires manual initiation it is also a kind of manual failover.

Failovers are not caused by database issues such as a database becoming suspect due to a loss of a data file, deletion of a database, or corruption of a transaction log.