Microsoft Clustering with VMware vSphere Design Guide

Update: 28/01/2016 – Post updated to reflect vSphere 6.x enhancements.

Microsoft Cluster Services (MSCS) has been around since the days of Windows NT4, providing high availability to tier 1 applications such as, MS Exchange and MS SQL Server. With the release of Server 2008, Microsoft Clustering Services was renamed to Windows Server Fail-Over Clustering (WSFC) with several updates.
This post will focus on the design choices when implemented with VMware vSphere and proposing alternatives along the way. This is not intended as a ‘step-by-step install guide‘ of MSCS/WSFC.

vSphere 5, 6.x Integration – MSCS/WSFC Design Guide

Update: Whats new vSphere 6

  • vMotion supported for cluster of virtual machines across physical hosts (CAB deployment).
  • ESXi 6.0 supports PSP_RR for Windows Server 2008 SP2 and above releases of MS Windows Server.
  • MSCS / WSFC is supported with VMware Virtual SAN (VSAN) version 6.1.
  • WSFC supported on Windows deployments of the VMware vCenter Server.
  • Be sure to review the WSFC VMware KB page for updates on supported configurations.

Requirements:

  • Virtual disk formats should be thick provisioned eager zeroed.
  • Update: vMotion support in ESXi 6.x only and for cluster of virtual machines across physical hosts (CAB) with passthrough RDMs you must use VM-hardware version 11.
    • VMware recommends updating the heart-beat timeout ‘SameSubnetThreshold’ registry value to 10. Additional info can be found on MS Failover Clustering and NLB Team Blog
    • The vMotion network must be a 10Gbps Ethernet link. 1Gbps Ethernet link for vMotion of WSFC virtual machines is not supported.
  • Synchronise time with a PDCe/NTP server (disable host based time synchronisation using VMware tools).
  • WSFC/MSCS requires a private or heartbeat network for cluster communication.
  • Modify the windows registry disk timeout value to 60 seconds or more.(HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue.)
  • Guest operating system and SCSI adapter support requirements:
    Operating System SCSI Adapter
    Windows 2003 SP1 or higher LSI Logic Parallel
    Windows 2008 SP2 or higher LSI Logic SAS
    Windows 2008 R2 SP1 and higher LSI Logic SAS
    Windows 2012 and above LSI Logic SAS
  • A shared storage drive (quorum) that is presented to all hosts in the environment that might host the MSCS/WSFC-virtual machines.
  • Quorum/Shared Storage Requirements:

    Storage Cluster in a Box (CIB) Cluster Across Box (CAB) Physical and Virtual Machine
    Virtual Disks (VMDK’s) Yes (recommended) No No
    Physical Mode Raw Device Mapping No Yes (recommended) Yes
    Virtual Mode Raw Device Mapping Yes No No
    In-guest iSCSI Yes Yes Yes
    In-guest SMB 3.0 Yes (Server 2012 only) Yes (Server 2012 only) Yes (Server 2012 only)

Limitations:

  • Windows 2000 VM’s are no longer supported from vSphere 4.1 onwards; Windows 2003 SP2 and 2008 R2, 2012, 2012 R2 is supported.
  • Five node clusters are possible (only two nodes in vSphere 5.0).
  • You must use at least VM hardware version 7. Update: For vMotion support in ESXi 6.x use VM hardware version 11.
  • Shared disks need to be the thick provision eager zeroed.
  • Only fibre channel SANs are supported. iSCSI, Fibre Channel over Ethernet (FCoE), and NFS shared storage aren’t. Update:  in vSphere 5.5, 6.x iSCSI, FCoE limitations have been lifted – restrictions apply see vmware KB2052238.
  • There is no support for vMotion/Storage vMotion, any attempt vmotion a VM will fail and may result in a node failing over. Technically vMotion is possible when using an iSCSI initiator inside the guest VM to connect your shared disk. Update: vMotion in CAB deployment supported in ESXi 6.x see requirements
  • NPIV, and Round-Robin multipathing is not recommended when using vSphere native multipathing. Third-party multipathing using round robin may be supported but check with your storage vendor. Note: In vSphere 5.5 PSP_RR is supported with restrictions. Update: PSP_RR is also now supported with ESXi 6.x.
  • WSFC/MSCS not supported with vSphere FT.
  • Increasing the size of the disks, hot add/CPU and memory is not supported.
  • Memory overcommitment is not recommended, overcommitment can be disruptive to the clustering mechanisms (optionally set VM, memory reservations).
  • Paravirtualised SCSI controllers not currently supported (this may be lifted – check VMware compatibility guides).
  • Pausing or resuming the VM state is not supported.

Use Cases:

  • Invariably it depends on the application, the application needs to be cluster-aware, not all applications support Microsoft Cluster Services.
  • Microsoft Exchange Server
    • See MSCS alternative with Exchange 2010 Database Availability Groups (Cluster Continuous Replication – CCR & Standby Continuous Replication – SCR).
  • Microsoft SQL Server
    • Alternative to MSCS/WSFC use SQL always on availability groups.
  • Web, DHCP, file and print services

Implementation Options:

Before we look at the various implementation options it may be worth covering some of the basic requirements of an MSCS cluster, a typical clustering setup includes the following:

  1. Drives that are shared between clustered nodes, this is a shared drive which is accessible to all nodes in the cluster. This ‘shared’ drive is also known as the quorum disk.
  2. A private heartbeat network that the nodes can use for node-to-node communication.
  3. A public network so the virtual machines can communicate with the network.
  • Cluster In A Box (CIB): When virtual machines are clustered running on the same ESXi host. The shared disks or quorum (either local or remote) is shared between the virtual machines. CIB can be used in test or development scenarios, this solution provides no protection in the event of hardware failure. You can use VMDK’s (SCSI bus set to virtual mode) or RAW Device Mappings (RDMs), RDMs are beneficial of you decide to migrate one of the VM’s to another host.

CIB_Diagram

  • Cluster Across Boxes (CAB): MSCS is deployed to two VM’s and the two VM’s are running across two different ESXi hosts. This protects against both software and hardware failures. Physical RDMs are the recommended disk choice. Shared storage/quorum should be located on fibre channel SAN or via an in-guest iSCSI initiator (be aware of the impact with the latter).

CAB_Diagram

  • Virtual Machine + Physical:  [VM N + 1 physical] clusters allows one MSCS cluster node to run natively on a physical server, while the other runs as a virtual machine. This mode can be used in order to migrate from a physical two node deployment to a virtualised environment. Physical RDMs are the recommended disk option here. Shared storage/quorum should be located on fibre channel SAN or via an in-guest iSCSI initiator (again be aware of the impact of using in-guest iSCSI).

V & P_Diagram

SCSI/Disk Configuration Parameters:

  • SCSI Controller Settings:
    • Disk types : An option when you add a new disk. You have the choice of VMDK, virtual RDM – (virtual compatibility mode), or physical RDM (physical compatibility mode).
    • SCSI bus-sharing setting: virtual sharing policy or physical sharing policy or none.
      • The SCSI bus-sharing setting needs to be edited after VM creation.
  • SCSI bus sharing values:
    • None: Used for disks that aren’t shared in the cluster (between VM’s), such as the VM’s boot drives/temp drives pages files.
    • Virtual: Use this value for cluster in box deployments (CIB).
    • Physical: Recommended for cluster across box or physical and virtual deployments (CAB, & virtual + physical).
  • Raw Device Mapping (RDM) used by quorum drive (see requirements for deployment use cases).
  • RDM options:
    • Virtual RDM mode (non-pass through mode)
      • Here the hardware characteristics of the LUN are hidden to the virtual machine, the VMkernel only sends read/write I/O to the LUN.
    • Physical RDM mode (pass-through mode)
      • The vmkernel passes all SCSi commands to the LUN, with the exception of the REPORT_LUNs command, so the VMKernel can isolate the LUN to the virtual machine. This mode is useful for SAN management agents (or SAN snapshots) FC SAN backup other SCSI target-based tools.
  • ESXi 5.x and 6.x uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as “perennially reserved” that is participating in an MSCS cluster. For ESXi hosts hosting passive MSCS nodes with RDM LUNs, use the esxcli command to mark the device as perennially reserved: esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true. See KB 1016106 for more information.

Design Impact:

  • MSCS requires a private or heartbeat network for cluster communication.
    • This means adding a second virtual nic to the VM which will be used for heartbeat communication between the virtual machines. This potentially means separate virtual machine port groups (recommended to use separate VLAN per port group for L2 segmentation).
  • If using in-guest iSCSI initiators, be aware that SCSI encapsulation is performed over virtual machine network. The recommendation is to separate this traffic from regular VM traffic (ensure you identify and dedicate bandwidth accordingly).
  • If you currently overcommit on resources, identify the root cause and address any issues (CPU/Memory), setting reservations is highly recommended in such scenarios. If you intend to use reservations, have a look at reservations using resource pools versus reservations at a VM level (Inappropriate use of reservations or badly architected resource pools can make the problem worse).
  • Factor in the application workload requirements, often providing high availability is only part of the solution. Account for application workload (CPU/Memory/Network/IOPs,Bandwidth) in order meet KPI’s or SLA’s.
    • If you intend to reserve memory or CPU, factor in the impact to the HA slot size calculation. For example: if you use ‘Admission Control, host failures the cluster can tolerate’. Larger slot sizes impact consolidation ratios.
  • You cannot use RDMs on local storage (see limitations), shared storage is required. Virtual storage appliances on local disks using iSCSI can be looped back to support this (not recommended for tier 1 applications).
  • Account for HA/DRS
    • Using vSphere DRS does not mean high availability is assured, DRS is used to balance workloads across the HA cluster. Note: For MSCS deployments, DRS cannot vMotion clustered VMs (vMotion supported in vSphere 6.x, allowing for DRS automation) DRS can make recommendations at power on for VM placement.
    • DRS anti-affinity rules should be created prior to powering on the clustered VM’s using raw device mappings.
    • To make sure that HA/DRS clustering functions don’t interfere with MSCS, you need to apply the following settings:
      • VMware recommend setting the VM’s with individual DRS setting to partially automated (forVM placement only). You should use ‘Must Run’ rules here, also set the advanced DRS setting ForceAffinityPoweron to 0.
  • For CIB deployments create VM-to-VM affinity rules to keep them together.
  • For CAB deployments create VM-to-VM anti-affinity rules to keep them apart. These should be ‘Must Run’ rules as there is no point in having two nodes on the same ESXi host.
  • Physical N+1 VMs don’t need any special affinity rules as one of the nodes is virtual and the other physical, unless you have a specific requirement to create such rules.
  • Important:  vSphere DRS also needs additional ‘Host-to-VM’ rule-groups, because HA doesn’t consider the ‘VM-to-VM’ rules when restarting VM’s in the event of hardware failure.
    • For CIB deployments, VM’s must be in the same VM DRS group, which must be assigned to a host DRS group containing two hosts using a ‘Must Run’ on hosts in group rule.
    • For CAB deployments VM’s must be in different VM DRS groups. The VM’s must be assigned to different host DRS groups using a ‘Must Run’ on hosts in group rule.

Recoverability:

  • If you are using physical mode RDM’s you will not be able to take advantage of VM-level snapshots which leverage the vSphere API for Data Protection (VADP). In scenarios using physical mode RDM’s you may want to investigate SAN level snapshots (with VSS integration) or in-guest backup agents.
    • SAN snapshots using physical RDM’s allow you to take advantage of array based snapshot technologies if your SAN supports it.
      • In guest backup agents can be used but be aware that you may need to create a separate virtual machine port group for backups, unless you want backups to be transported over your VM network! It is also worth noting that this doesn’t provide you with a backup of the virtual machine configuration information (.vmx or any of the vmware.log information) in the event you need to restore a VM.
    • Account for the impact of MSCS at your disaster recovery site, do you plan on a full MSCS implementation (CAB/Physical N+1).
    • What type of infrastructure will be available to support the workload at your recovery site? Is your recovery site cold/hot or a regional data centre used by other parts of the business?
    • Have you accounted for the clustered application itself? What steps need to be taken to ensure the application is accessible to users/customers?
    • Adhere to the RTO/RPO – MTD requirements for the application.
    • Lastly it goes without saying make sure your disaster recovery documentation is up to date.

Benefits:

  • If architected correctly with VMware vSphere, a virtual implementation can meet the demands for tier 1 applications.
  • Can reduce hardware and software costs by virtualising current WSFC deployments.
  • Leveraging DRS for VM placement:
    • Use VM-Affinity and Anti-affinity to help avoid  clustered vm’s on the same host during power on operations or host failures.
    • Use VM-to-Host rules (DRS-Groups) you can be used to locate specific VMs on particular hosts, ideal for use on blade servers where a VM can be housed on hosts from different blade chassis or racks.

Drawbacks:

  • When using RDM’s remember that each presented RDM reserves one LUN ID of which the maximum is 256 per ESXi host.
  • Taking into account the limitations stated above, in my opinion the biggest drawback is the high operational overhead involved with looking after virtual WSFC clusters. WSFC solutions can be complex to manage. Keep the solution simple! think about the operational cost to support the environment whilst obeying the availability requirements.

Conclusion:

If you are looking at virtualising your MS clusters – vSphere is a great choice with many features to support your decision (HA, DRS – Anti-Affitity Rules). However, before making any decisions an assessment of the design implications should be performed in order to identify how they affect availability, manageability, performance, recoverability and security. Migrating from physical to virtual instances of MSCS/WSFC may also offer a reduction in hardware and software costs (depending on your licensing model). It is also worth looking at other solutions which could be implemented, for example: native high availability features of VMware vSphere (VM & App monitoring, HA, Fault-Tolerance), these can provide a really good alternative to MS clustering solutions.

In the end the decision to use WSFC essentially comes down to the workload availability requirements defined by the customer or business owner, this will ultimately drive the decision behind your strategy.

Reference Documents:

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.