In my previous post I looked at the calculations required to determine the minimum number of hosts needed to satisfy the compute design. This was achieved through an assessment of the current state analysis, identifying average peak CPU and memory consumption.
A summary of the tools can be found here: VMware vSphere Compute Design … The same tools can be used to determine the VM/Physical server I/O profile, capacity and throughput requirements we need to design and scale an appropriate storage solution.
Getting your storage design right is crucial. A poorly designed SAN can negatively impact the vSphere Infrastructure. Storage like – networking and the compute layer are corner stone areas, that require careful planning and investment. Failures here may impact project delivery, budget, performance, damaging user and stakeholder experience.
This post will look at some of the principles around VMware storage design in general.
Key Decision Points & Considerations
- Plan for failure, a good storage design should take into account the impact of failure, for example:
- Site failure (DR), your SAN array may support block level replication, if you don’t have this capability (due to cost or features) look at network/host level replication offered in vSphere 5.1 or other replication tools. Disaster recovery is not just about ensuring you can backup and restore data its about ensuring business continuity.
- Identify bandwidth replication requirements / what is the rate of change ? (this impacts whether or not you can perform synchronous or a-synchronous replication).
- Failure of individual components (review this end to end) fabric interconnects, switches, storage processors, drive shelves, host HBA, power etc… the key point here is to find ways for mitigating any risks from an infrastructure point of view.
- Size and plan according to workload peaks (example factors: backups, month-end reporting)
- Array availability requirements, n+1, n+2 etc… at minimum your solution should withstand the failure of at least one node (n+1), however be aware of the impact if a storage processor is down for maintenance. During periods of maintenance availability requirements might not be satisfied.
- Scale the design for current and future IOPs and capacity requirements, total storage capacity is the sum of all current storage usage plus projected growth, IOPs provides the performance the array needs to support the workloads.
- Do you plan to use advanced technologies such as – deduplication, sub-lun tiering, caching?
- How will this impact the design, observe SIOC & array vendor best practises regarding the use of sub-lun tiering.
- Number and speed of drives needed (FC/SAS, SATA/NL, SSD), this has an impact on performance, capacity, availability and budget etc..
- Storage Protocol Choices – (FC/FCoE, iSCSI, NFS), the decision is driven by throughput and existing requirements and constraints.
- Whether service processors will run in an Active-Active, Active-Passive configuration
- This impacts host path selection policies, whether I/O requests can be balanced across all available paths.
- Impacts performance, I/O is balanced on a per LUN basis only – having additional ‘Active’ controllers to service requests can improve performance in conjunction with multi-pathing policies..
- Check array support for the VMware VAAI primitives (VAAI, VAAI-NFS, VASA and by extension Storage I/O control).
- This offers performance improvements (hardware offloading – hardware assisted copy, locking, block zeroing).
- Will you thin provision at the LUN or VM level?
- Thin provisioning has its benefits, but increases the management overhead. Common use case for environments that require ‘x’ amount of space but don’t use all the space allocated.
- The impact of out of space conditions on VAAI-supported arrays causes VM’s to stun. VM’s can be resumed if VMFS datastore space is increased or reclaimed, alternatively if VM swap files are stored on same datastore power off non-critical VM’s (virtual machine swap files are by default stored in the base VM folder, this can be changed in certain instances e.g : reduce replication bandwidth). Powering off the VM removes the .vswp file (the .vswp file equals memory granted to the VM less any reservations).
- The common cause for out of space conditions are attributed to poor or non-existent capacity monitoring. This can also be caused by snapshots that have grown out of control.
- Thin on thin is not recommended, due to operational overhead required to monitor both vmfs datastores and backing LUNs.
- Set appropriate queue depth values on HBA adapters (use with caution), follow vendor recommendations. Observe impact to consolidation ratios specifically the number of VMs in a VMFS datastore. Setting queue depths too high can have a negative impact on performance.
- For business critical applications you may want to limit virtual machine disk files to one or two virtual disks per VMFS datastore.
- Observe the ESXi LUN Maximums (currently 256)
- In situations where you have multiple VM virtual disks per VMFS datastore, you may want to use Storage I/O control (requires enterprise plus licensing). SIOC is triggered during periods of contention, VMs on datastores use an I/O queue slot relative to the VM’s share values, this ensures that high-priority VMs receive greater throughput than lower-priority ones.
- Quantify RAID requirement based on availability, capacity & performance requirements (IMO scope for throughput/IOPs first capacity second)
- Caveat: There is little or no use case for RAID 0.
- I/O size can have an adverse effect on IOPs, meaning a larger the I/O size the fewer the amount of IOPs the drive can generate.
- I/O size (KB) multiplied by IOPs = throughput requirement, the larger the I/O size the more it impacts IOPs.
- A higher number of IOPs might be due to a small I/O size (low throughput) whereas a larger I/O size might equate to a lower number of IOPs, but would be a higher amount of throughput. Understanding throughput requirements is crucial as this may dictate protocol & bandwidth requirements (iSCSI 1Gb/iSCSI 10Gb /FC etc…)
- Ensure that host HBA cards use are same lane PCIe slots, a lane is composed of two differential signaling pairs: one pair for receiving data, the other for transmitting, its not recommended placing one card in a x4 slot and another x16 slot.
- Design choices need to be validated against the requirements and constraints, as well as understanding the impact those decisions have on the design. For example, what if through your analysis you have determined that iSCSI is suitable protocol choice. Be aware of the impact to network components – a common strategy is to map this design choice against the infrastructure qualities (availability, manageability, performance, recoverability and security). Do you intend to use software initiators, dependent hardware initiators or independent hardware initiators? Each of these decisions impacts your design. i.e If you intend to use independent hardware initiators, how does this impact iSCSI security?, Do you have enough PCIe ports available in your hosts? Do you plan to use separate iSCSI switches or existing network switches?, Does the existing switches support large payloads sizes above 1500 bytes?, Do you have enough ports?, How will you secure the storage network ? (i.e.: with L2 non-routed VLANs), Will the switches be redundant? Is there available rack space/power etc…
- Finally, document everything!
- How will the resources, capacity, drive class characteristics (IOPs) be distributed amongst all the workloads?
- VM-to-Datastore allocation, Application/Infrastructure life cycles – (Production, Test, Dev).
- See use cases for SIOC: Link
- Prioritise critical applications on faster class of drives offering better performance / higher availability.
- It’s generally accepted to distribute intensive workloads across datastores, for example grouping several SQL servers on the same datastore can lead to contention and impact performance.
- Use SDRS – SDRS can load balancing I/O among datastores within a datastore clusters.
- Adhere to customer/business PCI-DSS compliance requirements (for example: logically separate datastores/storage domains). VCDX133 – Rene Van Den Bedem: has written a great post on how compliance requirements map to vSphere design decisions: Link.
- VM/Application availability requirements, ie MS Clustering (do you plan to use RDM’s, if so physical or virtual operating mode?)
- Beware of the impact of each mode (see my blog post on MS Clustering Design Guidelines).
- Create single partitions with single VMFS partitions per LUN.
- Creating multiple VMFS partitions per LUN increases SCSI reservations (impacting VM & virtual disk performance). For every partition created per LUN you increase the chance of metadata locks – this all adds up to increased latency.
- Factors that determine optimal datastore size:
- Max tolerable downtime (MTD), RPO-RTO, DR requirements.
- How will restores be performed?
- Will you be using disk or tape to perform VM restores?
- What is the performance of your restore device? understanding this impacts you’re RTO & maximum tolerable downtime.
- Tape drive transfer rates at 2: 1 compression – : LTO 2 = 173GB/hr, LTO 3 = 432GB/hr, LTO4 = 846GB/hr, LTO5 = 1TB/hr, LTO6 = 1.44TB/hr
- Calculating VM storage consumption = (VM Disk(s) Size + 100MB Log files) + (.VSWP size – Reservations) + (25% Growth).
Storage Protocol Decisions
iSCSI, NFS, FC, FCoE – Have a look at Cormac Hogans : Storage Protocol Comparisons. Link
vSphere VAAI Storage Primitives – here to help!
- Provides hardware offload capabilities
- Full Copy (hosts don’t need to read everything they write), this significantly improves storage vMotion, VM Cloning, template creation.
- Reduces unnecessary I/O on switches and front-end ports.
- Block Zeroing, (Write Same $) = faster disk creation times (use case eager-zeroed thick virtual disks).
- This also reduces the time it takes to create FT enabled VMs.
- Recommended for high performance workloads.
- Hardware Assisted Locking, (AT & S) – Excessive SCSI reservations by a host can cause performance degradation on other hosts that are accessing the same VMFS datastore.
- AT&S improves scalability and access efficiency by avoiding SCSI reservation issues.
- In addition SCSI/ T10 UNMAP, can reclaim dead-space by informing the storage array when a previously used blocks are longer needed.
Workload I/O Profiles
- Differing I/O profiles can impact storage design, for example using an IOPs requirements of 20,000 IOPs / RAID 5 with 15K FC/SAS drives (approximately 180 IOPS each).
- A Read-heavy workload, 90/10 (reads vs writes) could be satisfied with 149 drives.
- A balanced workload 50/50 (read vs writes) would require 286 drives.
- A write-heavy workload 10/90 (reads vs writes) would require 423 drives.
- Remember I/O size correlates to throughput
- Throughput = Functional Workload IOPs x I/O Size, using an an I/O Size of 8K
- Functional Workload IOPs (2000 x 8K = 20MB/s) x Number of workloads on host = VM:Host consolidation ratio.
- To convert MB/s to Megabits per second (iSCSI/NFS) multiply by 8.
- 20MB/s x 8 = 160 Mbp/s (Note: iSCIS/NFS – a single network card at full duplex can provide around 800Mbs, so in this scenario the workload requirement is satisfied but only on a single adapter).
Calculating the required number IOPs to satisfy the workload requirements
Use active or passive monitoring tools such as VMware Capacity planner (available to VMware partners only).If you are not a VMware partner check with your VAR(reseller) if they can perhaps help. There are also third-party tools available such as platespin power recon, perfmon, Quest, etc.. Which provide ways of capturing IO statistics.
Key points for assessment:
1. Determine the average peak IOPs per workload (VMware Capacity Planner, Windows Perfmon, iostat).
2. Determine the I/O profile– Reads versus Writes. Check the array or Perfmon/IOStat x number of workloads.
3. Determine throughput requirements, Read Mbps versus Write Mbps. Reads (KB/s) + Writes (KB/s) = Total maximum throughput.
4. Determine RAID type based on availability, capacity & performance requirements (as mentioned before scope for performance first).
As an example the following values will be used to run through a couple of sums –
|Number of workloads
|Average Peak IOPs
|RAID – Write Penalty
IO Profile = (Total Unit Workload IOPS × % READ) + ((Total Unit Workload IOPS × % WRITE) × RAID Penalty)
(59 x 75%) + ((59 x 26%) x 2)
(44.25) + ((15.34) x 2)
44.25 + 30.68 = 74.93 (IOPs Required)
Rounded up to 75 IOPs
Therefore, 75 IOPs per VM x Number of VMs you want to virtualise = Total IOPs required
75 IOPs x 100 VMs = 7500 IOPs (note: you may want to add 25% growth depending on customer requirements)
75 IOPs x 125 VMs = 9375 IOPs (with 25% VM growth) – This is the amount of IOPs the Storage Array needs to support.
Calculate the number of drives needed to satisfy the IO requirements
|Read rate KBps
|Write rate KBps
Determining read/write percentages
Total throughput = Reads + Writes
270,000 KBps + 80,000 KBps = 350,000 KBps
Total Read% = 270,000 / 350,000 = 77.14
Total Write% = 80,000 / 350,000 = 22.85
Using the derived read/write values we can determine the amount of drives needed to support required workloads.
Number of drives required = ((Total Read IOPs + (Total Write IOPS x RAID Penalty)) / Disk Speed IOPs)
Total IOPS Required = 20,000
Read : 77.14% of 20,000 IOPS = 15428
Write: 22.85% of 20,000 IOPS = 4570
RAID-5, write penalty of 4
Total Number of disks required = ((15428 + (4570 x 4)) /175) = 193 Disks
RAID-1, write penalty of 2
Total Number of disks required = ((15428 + (4570 x 2))/175) = 141 Disks
I hope you found the information useful – Any questions or if you feel some of the information needs clarification let me know.
VMware vSphere Storage Guide ESXi 5.5