VCAP-DCD

VMware vSphere Storage Design Considerations

In my previous post I looked at the calculations required to determine the minimum number of hosts needed to satisfy the compute design. This was achieved through an assessment of the current state analysis, identifying average peak CPU and memory consumption.

A summary of the tools can be found here: VMware vSphere Compute Design … The same tools can be used to determine the VM/Physical server I/O profile, capacity and throughput requirements we need to design and scale an appropriate storage solution.

Getting your storage design right is crucial. A poorly designed SAN can negatively impact the vSphere Infrastructure. Storage like – networking and the compute layer are corner stone areas, that require careful planning and investment. Failures here may impact project delivery, budget, performance, damaging user and stakeholder experience.

This post will look at some of the principles around VMware storage design in general.

Key Decision Points & Considerations

  • Plan for failure, a good storage design should take into account the impact of failure, for example:
    • Site failure (DR), your SAN array may support block level replication, if you don’t have this capability (due to cost or features) look at network/host level replication offered in vSphere 5.1 or other replication tools. Disaster recovery is not just about ensuring you can backup and restore data its about ensuring business continuity.
    • Identify bandwidth replication requirements / what is the rate of change ? (this impacts whether or not you can perform synchronous or a-synchronous replication).
    • Failure of individual components (review this end to end) fabric interconnects, switches, storage processors, drive shelves, host HBA, power etc… the key point here is to find ways for mitigating any risks from an infrastructure point of view.
  • Size and plan according to workload peaks (example factors: backups, month-end reporting)
  • Array availability requirements, n+1, n+2 etc… at minimum your solution should withstand the failure of at least one node (n+1), however be aware of the impact if a storage processor is down for maintenance. During periods of maintenance availability requirements might not be satisfied.
  • Scale the design for current and future IOPs and capacity requirements, total storage capacity is the sum of all current storage usage plus projected growth, IOPs provides the performance the array needs to support the workloads.
  • Do you plan to use advanced technologies such as – deduplication, sub-lun tiering, caching?
    • How will this impact the design, observe SIOC & array vendor best practises regarding the use of sub-lun tiering.
  • Number and speed of drives needed (FC/SAS, SATA/NL, SSD), this has an impact on performance, capacity, availability and budget etc..
Drive Type Unit IOP/s
SSD(SLC) 6,000 +
SSD(MLC) 1,000 +
15K RPM 175-200
10K RPM 125-150
7.2K RPM 50-75
  • Storage Protocol Choices – (FC/FCoE, iSCSI, NFS), the decision is driven by throughput and existing requirements and constraints.
  • Whether service processors will run in an Active-Active, Active-Passive configuration
    • This impacts host path selection policies, whether I/O requests can be balanced across all available paths.
    • Impacts performance, I/O is balanced on a per LUN basis only – having additional ‘Active’ controllers to service requests can improve performance in conjunction with multi-pathing policies..
  • Check array support for the VMware VAAI primitives (VAAI, VAAI-NFS, VASA and by extension Storage I/O control).
    • This offers performance improvements (hardware offloading – hardware assisted copy, locking, block zeroing).
  • Will you thin provision at the LUN or VM level?
    • Thin provisioning has its benefits, but increases the management overhead. Common use case for environments that require ‘x’ amount of space but don’t use all the space allocated.
    • The impact of out of space conditions on VAAI-supported arrays causes VM’s to stun. VM’s can be resumed if VMFS datastore space is increased or reclaimed, alternatively if VM swap files are stored on same datastore power off non-critical VM’s (virtual machine swap files are by default stored in the base VM folder, this can be changed in certain instances e.g : reduce replication bandwidth). Powering off the VM removes the .vswp file (the .vswp file equals memory granted to the VM less any reservations).
    • The common cause for out of space conditions are attributed to poor or non-existent capacity monitoring. This can also be caused by snapshots that have grown out of control.
    • Thin on thin is not recommended, due to operational overhead required to monitor both vmfs datastores and backing LUNs.
  • Set appropriate queue depth values on HBA adapters (use with caution), follow vendor recommendations. Observe impact to consolidation ratios specifically the number of VMs in a VMFS datastore. Setting queue depths too high can have a negative impact on performance.
  • For business critical applications you may want to limit virtual machine disk files to one or two virtual disks per VMFS datastore.
    • Observe the ESXi LUN Maximums (currently 256)
    • In situations where you have multiple VM virtual disks per VMFS datastore, you may want to use Storage I/O control (requires enterprise plus licensing). SIOC is triggered during periods of contention, VMs on datastores use an I/O queue slot relative to the VM’s share values, this ensures that high-priority VMs receive greater throughput than lower-priority ones.
  • Quantify RAID requirement based on availability, capacity & performance requirements (IMO scope for throughput/IOPs first capacity second)
    • Caveat: There is little or no use case for RAID 0.
  • I/O size can have an adverse effect on IOPs, meaning a larger the I/O size the fewer the amount of IOPs the drive can generate.
  • I/O size (KB) multiplied by IOPs = throughput requirement, the larger the I/O size the more it impacts IOPs.
    • A higher number of IOPs might be due to a small I/O size (low throughput) whereas a larger I/O size might equate to a lower number of IOPs, but would be a higher amount of throughput. Understanding throughput requirements is crucial as this may dictate protocol & bandwidth requirements (iSCSI 1Gb/iSCSI 10Gb /FC etc…)
  • Ensure that host HBA cards use are same lane PCIe slots, a lane is composed of two differential signaling pairs: one pair for receiving data, the other for transmitting, its not recommended placing one card in a x4 slot and another x16 slot.
  • Design choices need to be validated against the requirements and constraints, as well as understanding the impact those decisions have on the design. For example, what if through your analysis you have determined that iSCSI is suitable protocol choice. Be aware of the impact to network components – a common strategy is to map this design choice against the infrastructure qualities (availability, manageability, performance, recoverability and security). Do you intend to use software initiators, dependent hardware initiators or independent hardware initiators? Each of these decisions impacts your design. i.e If you intend to use independent hardware initiators, how does this impact iSCSI security?, Do you have enough PCIe ports available in your hosts? Do you plan to use separate iSCSI switches or existing network switches?, Does the existing switches support large payloads sizes above 1500 bytes?, Do you have enough ports?, How will you secure the storage network ? (i.e.: with L2 non-routed VLANs), Will the switches be redundant? Is there available rack space/power etc…
  • Finally, document everything!

Resource Allocation

  • How will the resources, capacity, drive class characteristics (IOPs) be distributed amongst all the workloads?
    • VM-to-Datastore allocation, Application/Infrastructure life cycles – (Production, Test, Dev).
    • See use cases for SIOC: Link
  • Prioritise critical applications on faster class of drives offering better performance / higher availability.
  • It’s generally accepted to distribute intensive workloads across datastores, for example grouping several SQL servers on the same datastore can lead to contention and impact performance.
    • Use SDRS – SDRS can load balancing I/O among datastores within a datastore clusters.
  • Adhere to customer/business PCI-DSS compliance requirements (for example: logically separate datastores/storage domains). VCDX133 – Rene Van Den Bedem: has written a great post on how compliance requirements map to vSphere design decisions: Link.
  • VM/Application availability requirements, ie MS Clustering (do you plan to use RDM’s, if so physical or virtual operating mode?)
    • Beware of the impact of each mode (see my blog post on MS Clustering Design Guidelines).
  • Create single partitions with single VMFS partitions per LUN.
    • Creating multiple VMFS partitions per LUN increases SCSI reservations (impacting VM & virtual disk performance). For every partition created per LUN you increase the chance of metadata locks – this all adds up to increased latency.
  • Factors that determine optimal datastore size:
    • Max tolerable downtime (MTD), RPO-RTO, DR requirements.
  • How will restores be performed?
    • Will you be using disk or tape to perform VM restores?
      • What is the performance of your restore device? understanding this impacts you’re RTO & maximum tolerable downtime.
      • Tape drive transfer rates at 2: 1 compression – : LTO 2 = 173GB/hr, LTO 3 = 432GB/hr, LTO4 = 846GB/hr, LTO5 = 1TB/hr, LTO6 = 1.44TB/hr
  • Calculating VM storage consumption = (VM Disk(s) Size + 100MB Log files) + (.VSWP size – Reservations) + (25% Growth).

Storage Protocol Decisions

iSCSI, NFS, FC, FCoE – Have a look at Cormac Hogans : Storage Protocol Comparisons. Link

vSphere VAAI Storage Primitives – here to help!

  • Provides hardware offload capabilities
    • Full Copy (hosts don’t need to read everything they write), this significantly improves storage vMotion, VM Cloning, template creation.
      • Reduces unnecessary I/O on switches and front-end ports.
  • Block Zeroing, (Write Same $) = faster disk creation times (use case eager-zeroed thick virtual disks).
    • This also reduces the time it takes to create FT enabled VMs.
    • Recommended for high performance workloads.
  • Hardware Assisted Locking, (AT & S) – Excessive SCSI reservations by a host can cause performance degradation on other hosts that are accessing the same VMFS datastore.
    • AT&S improves scalability and access efficiency by avoiding SCSI reservation issues.
  • In addition SCSI/ T10 UNMAP, can reclaim dead-space by informing the storage array when a previously used blocks are longer needed.

Workload I/O Profiles

  • Differing I/O profiles can impact storage design, for example using an IOPs requirements of 20,000 IOPs / RAID 5 with 15K FC/SAS drives (approximately 180 IOPS each).
    • A Read-heavy workload, 90/10 (reads vs writes) could be satisfied with 149 drives.
    • A balanced workload 50/50 (read vs writes) would require 286 drives.
    • A write-heavy workload 10/90 (reads vs writes) would require 423 drives.
  • Remember I/O size correlates to throughput
    • Throughput = Functional Workload IOPs x I/O Size, using an an I/O Size of 8K
    • Functional Workload IOPs (2000 x 8K = 20MB/s) x Number of workloads on host = VM:Host consolidation ratio.
      • To convert MB/s to Megabits per second (iSCSI/NFS) multiply by 8.
      • 20MB/s x 8 = 160 Mbp/s (Note: iSCIS/NFS –  a single network card at full duplex can provide around 800Mbs, so in this scenario the workload requirement is satisfied but only on a single adapter).

Calculating the required number IOPs to satisfy the workload requirements

Use active or passive monitoring tools such as VMware Capacity planner (available to VMware partners only).If you are not a VMware partner check with your VAR(reseller) if they can perhaps help. There are also third-party tools available such as platespin power recon, perfmon, Quest, etc.. Which provide ways of capturing IO statistics.

Key points for assessment:
1. Determine the average peak IOPs per workload (VMware Capacity Planner, Windows Perfmon, iostat).
2. Determine the I/O profile– Reads versus Writes. Check the array or Perfmon/IOStat x number of workloads.
3. Determine throughput requirements, Read Mbps versus Write Mbps. Reads (KB/s) + Writes (KB/s) = Total maximum throughput.
4. Determine RAID type based on availability, capacity & performance requirements (as mentioned before scope for performance first).

As an example the following values will be used to run through a couple of sums –

Number of workloads 100
Average Peak IOPs 59
% Read 74
% Write 26
RAID – Write Penalty 2

Formula:
IO Profile = (Total Unit Workload IOPS × % READ) + ((Total Unit Workload IOPS × % WRITE) × RAID Penalty)

(59 x 75%) + ((59 x 26%) x 2)
(44.25) + ((15.34) x 2)
44.25 + 30.68 = 74.93 (IOPs Required)
Rounded up to 75 IOPs

Therefore, 75 IOPs per VM x Number of VMs you want to virtualise = Total IOPs required

75 IOPs x 100 VMs = 7500 IOPs (note: you may want to add 25% growth depending on customer requirements)
75 IOPs x 125 VMs = 9375 IOPs (with 25% VM growth) – This is the amount of IOPs the Storage Array needs to support.

Calculate the number of drives needed to satisfy the IO requirements
Example values:

IOPs Required 20,000
Read rate KBps 270,000
Write rate KBps 80,000
Total throughput 440,000

Determining read/write percentages

Total throughput = Reads + Writes
270,000 KBps + 80,000 KBps = 350,000 KBps
Total Read% = 270,000 / 350,000 = 77.14
Total Write% = 80,000 / 350,000 = 22.85

Using the derived read/write values we can determine the amount of drives needed to support required workloads.

Number of drives required = ((Total Read IOPs + (Total Write IOPS x RAID Penalty)) / Disk Speed IOPs)

Total IOPS Required = 20,000
Read : 77.14% of 20,000 IOPS = 15428
Write: 22.85% of 20,000 IOPS = 4570

RAID-5, write penalty of 4
Total Number of disks required = ((15428 + (4570 x 4)) /175) = 193 Disks

RAID-1, write penalty of 2
Total Number of disks required = ((15428 + (4570 x 2))/175) = 141 Disks

I hope you found the information useful – Any questions or if you feel some of the information needs clarification let me know.

Recommended Reading

VMware vSphere Storage Guide ESXi 5.5

 

VMware vSphere Compute Design Considerations

VMware vSphere Compute Design

Being able to gather and analyse requirements, then size the compute design based on those requirements is one of the key functions of any virtualisation project.
In terms of evaluating compute requirements, you should use the information gathered from your requirements analysis phase to factor in workload, availability and estimated growth.

The following factors should also be considered when sizing yours hosts, for example: Host performance, budget, datacentre capacity (rack, power, cooling etc…), infrastructure (network connectivity, storage connectivity etc…), security – PCI-DSS compliance (HIPAA, SOX, PCI etc..), scale-up / scale-out deployment strategy.

Here are a few well known tools, which can assist you in collecting performance and capacity information in order to calculate the workload requirements:

  • VMware Capacity Planner: A hosted service available to VMware partners, VMware Capacity planner collects resource utilisation and compares the data against industry values.
  • Platespin Recon from NetIQ: Workload analysis with remote data collection.
  • Cirba: Utilisation analysis, with normalization leveraging industry standards from SPEC.org.
  • Windows: Performance Monitor, disk manager
  • Linux: df, iostat, top, mpstat (use sar to output information to a datafile).

Lets assume that through your investigations you have you have identified 200 workloads which could be virtualized or migrated to a vSphere environment. As mentioned before your compute design should account for any future growth (lets assume 25% growth over 5 years).

Workload Requirement Analysis

Workload Estimations Values
Average CPU count 2
Average CPU per server (MHz) 2,000 MHz
Average CPU utilisation (percentage) 8%
Average RAM per server (MB) 4096MB
Average RAM utilisation (Percentage) 72%

Calculating the Average Peak CPU Utilisation:
Average CPU per server (MHz) × Average CPU count = Average CPU per server (MHz)
2,000 MHz × 2 = 4,000 MHz

Average CPU per server(MHz) × average CPU utilisation (percentage) = Workload average peak CPU utilisation (MHz)
4,000MHz × 8.00% = 320MHz

Number of workloads × Workload average peak CPU utilisation (MHz) = Total Peak CPU utilisation (MHz)
200 × 320MHz = 64,000MHz

Therefore we have determined that 64,000MHz will be required to support the 200 workloads.

Accounting for Growth:
Total CPU (MHz) required for the virtual infrastructure x forecasted growth = Total CPU (MHz) required
64,000MHz x 25% growth = 16000MHz + 64,000MHz = 80,000 MHz

Calculating the required number of ESXi hosts to satisfy the design requirements
Two 6-core CPU at 2,000MHz per Core
2 Sockets x 6 Cores x 2000 MHz = 24,000 MHz per  ESXi host @ 100% utilisation

Running any solution at 100% utilisation is never recommended, lets assume we’d like to have the system operating at 70% utilisation. It’s worth pointing out that resource utilisation should be continously monitored, its rare that workloads remain static.

Total Host CPU in MHz x Expected CPU utilisation = Total CPU Utilisation per ESXi host
24,000 MHz x 70% = 16,800 MHz

Total CPU Utilisation per host / Total CPU Utilisation required = Number of hosts needed
80,000 MHz / 16,800 MHz = 5 (rounded up)

Availability
Here you have a few decisions to make. Lets assume at minimum the design requires n+1 availability, this means that in the event of host failure you could have enough compute capacity to support the failure of one host in your design. However, you may not be able to satisfy the availability requirements during periods of host maintenance, so you may want to use n+2 availability.

Number of hosts needed + availability requirements (n+1) = 6 ESXi hosts.

vCPU-to-Core Ratio
This identifies the amount of assigned vCPUs to the number of logical CPU’s present. For CPU intensive workloads use a lower vCPU to core ratio, this may impact your VM-to-host consolidation ratios so bear this in mind.

Calculating the Average Peak RAM Utilisation:
Average RAM per server (MB) × Average RAM utilisation (Percentage) = Average peak RAM utilisation (MB)
4,096MB × 72.00% = 2949.12MB

Number of workloads × Average peak RAM utilisation (MB) = Total peak RAM utilisation (MB)
200 × 2949.12MB = 589,824MB

Accounting for Growth
Total peak RAM utilisation (MB) x forecasted growth = Total RAM required
589,824MB x 25% = 737,280MB

Note: This does not account for VMware TPS savings, which is typically around 25%.
Note: This does not include memory overhead requirements. This is memory needed by the VMkernel to support the workload. This is typically dependent on the amount of RAM/vCPU’s assigned to the virtual machine at power on. The following link gives you some idea of the amount of overhead required for each VM:VMware Memory Overhead

Calculating the required number of hosts to satisfy the design requirements
For this calculation we will assume the host has 256GB of RAM installed operating at 70% utilisation.

RAM installed per host x Total host RAM utilisation = Total RAM (MB) available
262,144 MB x 70% = 183,500.8MB

Total peak RAM utilisation (MB) / Total RAM (MB) available = Number of hosts needed
737,280MB / 183,500.8MB = 5 (Rounded up)

Number of hosts needed + availability requirements (n+1) = 6 ESXi hosts.

We have therefore established that we require six ESXi hosts to satisfy the compute workload, annual growth and availability requirements.
In a situation where the CPU or RAM produces a higher number of host, always choose the higher value in order to support both min & max CPU/RAM requirements.

Microsoft Clustering with VMware vSphere Design Guide

Update: 28/01/2016 – Post updated to reflect vSphere 6.x enhancements.

Microsoft Cluster Services (MSCS) has been around since the days of Windows NT4, providing high availability to tier 1 applications such as, MS Exchange and MS SQL Server. With the release of Server 2008, Microsoft Clustering Services was renamed to Windows Server Fail-Over Clustering (WSFC) with several updates.
This post will focus on the design choices when implemented with VMware vSphere and proposing alternatives along the way. This is not intended as a ‘step-by-step install guide‘ of MSCS/WSFC.

vSphere 5, 6.x Integration – MSCS/WSFC Design Guide

Update: Whats new vSphere 6

  • vMotion supported for cluster of virtual machines across physical hosts (CAB deployment).
  • ESXi 6.0 supports PSP_RR for Windows Server 2008 SP2 and above releases of MS Windows Server.
  • MSCS / WSFC is supported with VMware Virtual SAN (VSAN) version 6.1.
  • WSFC supported on Windows deployments of the VMware vCenter Server.
  • Be sure to review the WSFC VMware KB page for updates on supported configurations.

Requirements:

  • Virtual disk formats should be thick provisioned eager zeroed.
  • Update: vMotion support in ESXi 6.x only and for cluster of virtual machines across physical hosts (CAB) with passthrough RDMs you must use VM-hardware version 11.
    • VMware recommends updating the heart-beat timeout ‘SameSubnetThreshold’ registry value to 10. Additional info can be found on MS Failover Clustering and NLB Team Blog
    • The vMotion network must be a 10Gbps Ethernet link. 1Gbps Ethernet link for vMotion of WSFC virtual machines is not supported.
  • Synchronise time with a PDCe/NTP server (disable host based time synchronisation using VMware tools).
  • WSFC/MSCS requires a private or heartbeat network for cluster communication.
  • Modify the windows registry disk timeout value to 60 seconds or more.(HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue.)
  • Guest operating system and SCSI adapter support requirements:
    Operating System SCSI Adapter
    Windows 2003 SP1 or higher LSI Logic Parallel
    Windows 2008 SP2 or higher LSI Logic SAS
    Windows 2008 R2 SP1 and higher LSI Logic SAS
    Windows 2012 and above LSI Logic SAS
  • A shared storage drive (quorum) that is presented to all hosts in the environment that might host the MSCS/WSFC-virtual machines.
  • Quorum/Shared Storage Requirements:

    Storage Cluster in a Box (CIB) Cluster Across Box (CAB) Physical and Virtual Machine
    Virtual Disks (VMDK’s) Yes (recommended) No No
    Physical Mode Raw Device Mapping No Yes (recommended) Yes
    Virtual Mode Raw Device Mapping Yes No No
    In-guest iSCSI Yes Yes Yes
    In-guest SMB 3.0 Yes (Server 2012 only) Yes (Server 2012 only) Yes (Server 2012 only)

Limitations:

  • Windows 2000 VM’s are no longer supported from vSphere 4.1 onwards; Windows 2003 SP2 and 2008 R2, 2012, 2012 R2 is supported.
  • Five node clusters are possible (only two nodes in vSphere 5.0).
  • You must use at least VM hardware version 7. Update: For vMotion support in ESXi 6.x use VM hardware version 11.
  • Shared disks need to be the thick provision eager zeroed.
  • Only fibre channel SANs are supported. iSCSI, Fibre Channel over Ethernet (FCoE), and NFS shared storage aren’t. Update:  in vSphere 5.5, 6.x iSCSI, FCoE limitations have been lifted – restrictions apply see vmware KB2052238.
  • There is no support for vMotion/Storage vMotion, any attempt vmotion a VM will fail and may result in a node failing over. Technically vMotion is possible when using an iSCSI initiator inside the guest VM to connect your shared disk. Update: vMotion in CAB deployment supported in ESXi 6.x see requirements
  • NPIV, and Round-Robin multipathing is not recommended when using vSphere native multipathing. Third-party multipathing using round robin may be supported but check with your storage vendor. Note: In vSphere 5.5 PSP_RR is supported with restrictions. Update: PSP_RR is also now supported with ESXi 6.x.
  • WSFC/MSCS not supported with vSphere FT.
  • Increasing the size of the disks, hot add/CPU and memory is not supported.
  • Memory overcommitment is not recommended, overcommitment can be disruptive to the clustering mechanisms (optionally set VM, memory reservations).
  • Paravirtualised SCSI controllers not currently supported (this may be lifted – check VMware compatibility guides).
  • Pausing or resuming the VM state is not supported.

Use Cases:

  • Invariably it depends on the application, the application needs to be cluster-aware, not all applications support Microsoft Cluster Services.
  • Microsoft Exchange Server
    • See MSCS alternative with Exchange 2010 Database Availability Groups (Cluster Continuous Replication – CCR & Standby Continuous Replication – SCR).
  • Microsoft SQL Server
    • Alternative to MSCS/WSFC use SQL always on availability groups.
  • Web, DHCP, file and print services

Implementation Options:

Before we look at the various implementation options it may be worth covering some of the basic requirements of an MSCS cluster, a typical clustering setup includes the following:

  1. Drives that are shared between clustered nodes, this is a shared drive which is accessible to all nodes in the cluster. This ‘shared’ drive is also known as the quorum disk.
  2. A private heartbeat network that the nodes can use for node-to-node communication.
  3. A public network so the virtual machines can communicate with the network.
  • Cluster In A Box (CIB): When virtual machines are clustered running on the same ESXi host. The shared disks or quorum (either local or remote) is shared between the virtual machines. CIB can be used in test or development scenarios, this solution provides no protection in the event of hardware failure. You can use VMDK’s (SCSI bus set to virtual mode) or RAW Device Mappings (RDMs), RDMs are beneficial of you decide to migrate one of the VM’s to another host.

CIB_Diagram

  • Cluster Across Boxes (CAB): MSCS is deployed to two VM’s and the two VM’s are running across two different ESXi hosts. This protects against both software and hardware failures. Physical RDMs are the recommended disk choice. Shared storage/quorum should be located on fibre channel SAN or via an in-guest iSCSI initiator (be aware of the impact with the latter).

CAB_Diagram

  • Virtual Machine + Physical:  [VM N + 1 physical] clusters allows one MSCS cluster node to run natively on a physical server, while the other runs as a virtual machine. This mode can be used in order to migrate from a physical two node deployment to a virtualised environment. Physical RDMs are the recommended disk option here. Shared storage/quorum should be located on fibre channel SAN or via an in-guest iSCSI initiator (again be aware of the impact of using in-guest iSCSI).

V & P_Diagram

SCSI/Disk Configuration Parameters:

  • SCSI Controller Settings:
    • Disk types : An option when you add a new disk. You have the choice of VMDK, virtual RDM – (virtual compatibility mode), or physical RDM (physical compatibility mode).
    • SCSI bus-sharing setting: virtual sharing policy or physical sharing policy or none.
      • The SCSI bus-sharing setting needs to be edited after VM creation.
  • SCSI bus sharing values:
    • None: Used for disks that aren’t shared in the cluster (between VM’s), such as the VM’s boot drives/temp drives pages files.
    • Virtual: Use this value for cluster in box deployments (CIB).
    • Physical: Recommended for cluster across box or physical and virtual deployments (CAB, & virtual + physical).
  • Raw Device Mapping (RDM) used by quorum drive (see requirements for deployment use cases).
  • RDM options:
    • Virtual RDM mode (non-pass through mode)
      • Here the hardware characteristics of the LUN are hidden to the virtual machine, the VMkernel only sends read/write I/O to the LUN.
    • Physical RDM mode (pass-through mode)
      • The vmkernel passes all SCSi commands to the LUN, with the exception of the REPORT_LUNs command, so the VMKernel can isolate the LUN to the virtual machine. This mode is useful for SAN management agents (or SAN snapshots) FC SAN backup other SCSI target-based tools.
  • ESXi 5.x and 6.x uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as “perennially reserved” that is participating in an MSCS cluster. For ESXi hosts hosting passive MSCS nodes with RDM LUNs, use the esxcli command to mark the device as perennially reserved: esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true. See KB 1016106 for more information.

Design Impact:

  • MSCS requires a private or heartbeat network for cluster communication.
    • This means adding a second virtual nic to the VM which will be used for heartbeat communication between the virtual machines. This potentially means separate virtual machine port groups (recommended to use separate VLAN per port group for L2 segmentation).
  • If using in-guest iSCSI initiators, be aware that SCSI encapsulation is performed over virtual machine network. The recommendation is to separate this traffic from regular VM traffic (ensure you identify and dedicate bandwidth accordingly).
  • If you currently overcommit on resources, identify the root cause and address any issues (CPU/Memory), setting reservations is highly recommended in such scenarios. If you intend to use reservations, have a look at reservations using resource pools versus reservations at a VM level (Inappropriate use of reservations or badly architected resource pools can make the problem worse).
  • Factor in the application workload requirements, often providing high availability is only part of the solution. Account for application workload (CPU/Memory/Network/IOPs,Bandwidth) in order meet KPI’s or SLA’s.
    • If you intend to reserve memory or CPU, factor in the impact to the HA slot size calculation. For example: if you use ‘Admission Control, host failures the cluster can tolerate’. Larger slot sizes impact consolidation ratios.
  • You cannot use RDMs on local storage (see limitations), shared storage is required. Virtual storage appliances on local disks using iSCSI can be looped back to support this (not recommended for tier 1 applications).
  • Account for HA/DRS
    • Using vSphere DRS does not mean high availability is assured, DRS is used to balance workloads across the HA cluster. Note: For MSCS deployments, DRS cannot vMotion clustered VMs (vMotion supported in vSphere 6.x, allowing for DRS automation) DRS can make recommendations at power on for VM placement.
    • DRS anti-affinity rules should be created prior to powering on the clustered VM’s using raw device mappings.
    • To make sure that HA/DRS clustering functions don’t interfere with MSCS, you need to apply the following settings:
      • VMware recommend setting the VM’s with individual DRS setting to partially automated (forVM placement only). You should use ‘Must Run’ rules here, also set the advanced DRS setting ForceAffinityPoweron to 0.
  • For CIB deployments create VM-to-VM affinity rules to keep them together.
  • For CAB deployments create VM-to-VM anti-affinity rules to keep them apart. These should be ‘Must Run’ rules as there is no point in having two nodes on the same ESXi host.
  • Physical N+1 VMs don’t need any special affinity rules as one of the nodes is virtual and the other physical, unless you have a specific requirement to create such rules.
  • Important:  vSphere DRS also needs additional ‘Host-to-VM’ rule-groups, because HA doesn’t consider the ‘VM-to-VM’ rules when restarting VM’s in the event of hardware failure.
    • For CIB deployments, VM’s must be in the same VM DRS group, which must be assigned to a host DRS group containing two hosts using a ‘Must Run’ on hosts in group rule.
    • For CAB deployments VM’s must be in different VM DRS groups. The VM’s must be assigned to different host DRS groups using a ‘Must Run’ on hosts in group rule.

Recoverability:

  • If you are using physical mode RDM’s you will not be able to take advantage of VM-level snapshots which leverage the vSphere API for Data Protection (VADP). In scenarios using physical mode RDM’s you may want to investigate SAN level snapshots (with VSS integration) or in-guest backup agents.
    • SAN snapshots using physical RDM’s allow you to take advantage of array based snapshot technologies if your SAN supports it.
      • In guest backup agents can be used but be aware that you may need to create a separate virtual machine port group for backups, unless you want backups to be transported over your VM network! It is also worth noting that this doesn’t provide you with a backup of the virtual machine configuration information (.vmx or any of the vmware.log information) in the event you need to restore a VM.
    • Account for the impact of MSCS at your disaster recovery site, do you plan on a full MSCS implementation (CAB/Physical N+1).
    • What type of infrastructure will be available to support the workload at your recovery site? Is your recovery site cold/hot or a regional data centre used by other parts of the business?
    • Have you accounted for the clustered application itself? What steps need to be taken to ensure the application is accessible to users/customers?
    • Adhere to the RTO/RPO – MTD requirements for the application.
    • Lastly it goes without saying make sure your disaster recovery documentation is up to date.

Benefits:

  • If architected correctly with VMware vSphere, a virtual implementation can meet the demands for tier 1 applications.
  • Can reduce hardware and software costs by virtualising current WSFC deployments.
  • Leveraging DRS for VM placement:
    • Use VM-Affinity and Anti-affinity to help avoid  clustered vm’s on the same host during power on operations or host failures.
    • Use VM-to-Host rules (DRS-Groups) you can be used to locate specific VMs on particular hosts, ideal for use on blade servers where a VM can be housed on hosts from different blade chassis or racks.

Drawbacks:

  • When using RDM’s remember that each presented RDM reserves one LUN ID of which the maximum is 256 per ESXi host.
  • Taking into account the limitations stated above, in my opinion the biggest drawback is the high operational overhead involved with looking after virtual WSFC clusters. WSFC solutions can be complex to manage. Keep the solution simple! think about the operational cost to support the environment whilst obeying the availability requirements.

Conclusion:

If you are looking at virtualising your MS clusters – vSphere is a great choice with many features to support your decision (HA, DRS – Anti-Affitity Rules). However, before making any decisions an assessment of the design implications should be performed in order to identify how they affect availability, manageability, performance, recoverability and security. Migrating from physical to virtual instances of MSCS/WSFC may also offer a reduction in hardware and software costs (depending on your licensing model). It is also worth looking at other solutions which could be implemented, for example: native high availability features of VMware vSphere (VM & App monitoring, HA, Fault-Tolerance), these can provide a really good alternative to MS clustering solutions.

In the end the decision to use WSFC essentially comes down to the workload availability requirements defined by the customer or business owner, this will ultimately drive the decision behind your strategy.

Reference Documents:

 

VCAP5-DCD Certification

The VMware Certified Advanced Professional 5 – Data Center Design (VCAP5-DCD) certification is designed for IT architects who design and integrate VMware solutions in multi-site, large enterprise, virtualised environments.

The VCAP-DCD focuses on a deep understanding of the design principles and methodologies behind datacentre virtualisation. This certification relies on your ability to understand and decipher both non-functional and functional-requirements, risks, assumptions and constraints. Before undertaking this, you should have thorough understanding of the following areas :  Availability and security, storage / network design, disaster recovery – and by extension performing business impact analysis / business continuity, dependency mapping, automation and service management.

Key in this process is understanding, how design decisions relating to availability, manageability, performance, recoverability and security impact the design.

I have put together some of the vSphere 5 best practises referenced in the exam blueprint, I hope you find the information helpful if you considering taking this exam. vSphere 5 Design Best Practise Guide.pdf

In preparation for this exam, I found the following books very useful along with the documents provided in the exam blueprint.

  • VMware vSphere 5.1 Clustering Deepdive – Duncan Epping & Frank Denneman
  • VMware vSphere Design 2nd Edition – Scott Lowe & Forbes Guthrie
  • Managing and Optimising VMware vSphere Deployments – Sean Crookston & Harley Stagner
  • Virtualising Microsoft Business Critical Applications on VMware vSphere – Matt Liebowitz & Alex Fontana
  • VCAP-DCD Official Cert Guide – Paul McSharry
  • ITIL v3 Handbook – UK Office of Government & Commerce

Here is the VCAP-DCD certification requirements road map:

VCAP-DCD Certification

As mentioned before, the blueprint is key! – Here is a URL export of all the tools/resources the blueprint targets, this should save time in trawling through the pdf.

Section 1 – Create a vSphere Conceptual Design
Objective 1.1 – Gather and analyze business requirements
VMware Virtualization Case Studies
Five Steps to Determine When to Virtualize Your Servers
Functional vs. Non-Functional Requirements
Conceptual, Logical, Physical:  It is Simple

Objective 1.2 – Gather and analyze application requirements
VMware Cost-Per-Application Calculator
VMware Virtualizing Oracle Kit
VMware Virtualizing Exchange Kit
VMware Virtualizing SQL Kit
VMware Virtualizing SAP Kit
VMware Virtualizing Enterprise Java Kit
Business and Financial Benefits of Virtualization: Customer Benchmarking Study

Objective 1.3 – Determine Risks, Constraints, and Assumptions
Developing Your Virtualization Strategy and Deployment Plan

Section 2 – Create a vSphere Logical Design from an Existing Conceptual Design
Objective 2.1 –Map Business Requirements to the Logical Design
Conceptual, Logical, Physical:  It is Simple
VMware vSphere  Basics Guide
What’s  New  in  VM ware  v Sphere  5 
Functional vs. Non-Functional Requirements
ITIL v3 Introduction and Overview

Objective 2.2 – Map Service Dependencies
Datacenter Operational Excellence Through Automated Application Discovery & Dependency Mapping

Objective 2.3 – Build Availability Requirements into the Logical Desig
Improving Business Continuity with VMware Virtualization Solution Brief
VMware High Availability Deployment Best Practices
vSphere Availability Guide

Objective 2.4 – Build Manageability Requirements into the Logical Design
Optimizing Your VMware Environment
Four Keys to Managing Your VMware Environment
Operational Readiness Assessment
Operational Readiness Assessment Tool

Objective 2.5 – Build Performance Requirements into the Logical Design
Proven Practice: Implementing ITIL v3 Capacity Management in a VMware environment
vSphere Monitoring and Performance Guide

Objective 2.6 – Build Recoverability Requirements into the Logical Design
VMware vCenter™  Site Recovery Manager Evaluation Guide
A Practical Guide to Business Continuity and Disaster Recovery with VMware Infrastructure
Mastering Disaster Recovery: Business Continuity and Disaster Recovery Whitepaper
Designing Backup Solutions for VMware vSphere

Objective 2.7 – Build Security Requirements into the Logical Design
vSphere Security Guide
Developing Your Virtualization Strategy and Deployment Plan
Achieving Compliance in a Virtualized Environment
Infrastructure Security:  Getting to the Bottom of Compliance in the Cloud
Securing the Cloud

Section 3 – Create a vSphere Physical Design from an Existing Logical Design
Objective 3.1 – Transition from a Logical Design to a vSphere 5 Physical Design
Conceptual, Logical, Physical:  It is Simple
vSphere Server and Host Management Guide
vSphere Virtual Machine Administration Guide

Objective 3.2 – Create a vSphere 5 Physical Network Design from an Existing Logical Design
vSphere Server and Host Management Guide
vSphere Installation and Setup Guide
vMotion Architecture, Performance and Best Practices in VMware vSphere 5
VMware are  vSphere™: Deployment Methods for the VMware® vNetwork Distributed Switch
vNetwork Distributed Switch: Migration and Configuration
Guidelines for Implementing VMware vSphere with the Cisco Nexus 1000V Virtual Switch
VMware® Network I/O Control: Architecture, Performance and Best Practices

Objective 3.3 – Create a vSphere 5 Physical Storage Design from an Existing Logical Design
Fibre Channel SAN Configuration Guide
iSCSI SAN Configuration Guide
vSphere Installation and Setup Guide
Performance Implications of Storage I/O Control–Enabled NFS Datastores in VMware vSphere® 5.0
Managing Performance Variance of Applications Using Storage I/O Control
VMware Virtual Machine File System: Technical Overview and Best Practices

Objective 3.4 – Determine Appropriate Compute Resources for a vSphere 5 Physical Design
vSphere Server and Host Management Guide
vSphere Installation and Setup Guide
vSphere Resource Management Guide

Objective 3.5 – Determine Virtual Machine Configuration for a vSphere 5 Physical Design
vSphere Server and Host Management Guide
Virtual Machine Administration Guide
Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs
Virtualizing a Windows Active Directory Domain Infrastructure
Guest Operating System Installation Guide

Objective 3.6 – Determine Data Center Management Options for a vSphere 5 Physical Design
vSphere Monitoring and Performance Guide
vCenter Server and Host Management Guide
VMware vCenter Update Manager 5.0 Performance and Best Practices

Section 4 – Implementation Planning
Objective 4.1 – Create an Execute a Validation Plan
vSphere Server and Host Management Guide
Validation Test Plan

Objective 4.2 – Create an Implementation Plan
vSphere Server and Host Management Guide
Operational Test Requirement Cases

Objective 4.3 – Create an Installation Guide
vSphere Server and Host Management Guide