VCAP5-DCD

TOGAF study resources

The Open Group Architecture Framework (TOGAF) provides process and procedures for the design, implementation and governance of enterprise architecture. The framework can be used as a set of building blocks to develop an enterprise architecture for use within an organisation.

TOGAF is developed and maintained by members of The Open Group, working within the Architecture Forum. I plan to use elements of the framework to supplement and rationalise my design theory, which I hope will aid me with my VCDX defence.

Resources:
An introduction to TOGAF version 9.1 – Guidelines, Architecture Development Method (ADM), content framework & reference models.
TOGAF 9 Components 
TOGAF 9 Template Deliverables, Set 1
TOGAF 9 Template Artifacts and Deliverables, Set 2
TOGAF Publications, webinars and whitepapers
IT Architecture and Design Patterns/Assets: An Initial Assessment
TOGAF 9 Certification Program
TOGAF Benefits
Enterprise Architecture Standards

Books:
TOGAF 9 Foundation Study Guide

Additional Content:
The Open Group IT4IT™ Reference Architecture, Version 1.3
Service Orientated Reference Architecture
Study Information Landing Page
Hitchhikers guide to TOGAF exam preparation

Infrastructure Design & Project Framework

Successfully planning, designing and implementing a virtualisation project can be a very rewarding experience. Whether you are working alone or in a team you may find the task initially daunting, be unsure of where to start or lack the appropriate framework from which to work from. Hopefully this information will support you, if you have been given the task or have successfully completed a virtualization project, but want to identify ways to make the next upgrade or implementation more efficient.

Infrastructure design is a deep subject with many facets and interlinking dependencies of design choices. The four pillars, referred to as compute (see my compute design post), storage (see my storage design post), networking and management, can be very complex to integrate successfully, when considering all the options. A great deal of emphasis should be placed on understanding design decisions, as poor planning can lead to additional costs, the project not meeting organisation goals and ultimately a failure to deliver. Through each part of the design process it is important that you validate design decisions against requirements identified through the information gathering process.

Furthermore, design decisions should be continually evaluated against infrastructure qualities such as availability, manageability, performance, recoverability and security.

Project Framework

Use the following key points/stages to plan and build your project:

1. Information Gathering
2. Current State, Future State and Gap Analysis
3. Conceptual, Logical & Physical Design Process
4. Migration and Implementation Planning
5. Functional Testing / Quality Assurance
6. Continuous Improvement
7. Monitoring Performance, Availability and Capacity

1. Information Gathering: Information should be gathered from stakeholders / C-level executives, application owners and subject matter experts to define and identify:

  • The Project scope / project boundaries, for example, upgrade the VMware vSphere infrastructure at the organisations central European offices only.
  • Project goals, what is it the organisation wants to achieve? For example reduce physical server footprint by 25% before the end of the financial year.
  • Service Level Agreements (SLA), Service Level Objectives (SLO), Recovery Time Objectives (RTO), Recovery Point Objectives (RPO) : [Maximum Tolerable Downtime MTD].
  • Key Performance Indicators (KPI), relating to application response times.
  • Any requirements, both functional and non-functional i.e regulatory compliance – HIPAA, SOX, PCI etc. Understand the impact on the design required to meet HIPAA compliancy (a US standard, but acknowledged under EU-ISO/IEC 13335-1:2004 information protection guidelines), which states that data, communication must be encrypted (HTTPs, SSL, IPSEC, SSH). A functional requirement specifies something the design must do for example support 5000 virtual machines, whereas a non-functional requirement specifies how the system should behave, For example: Workloads deemed as business critical must not be subject to resource starvation (CPU, Mem, Network, Disk) and must be protected using appropriate mechanisms.
  • Constraints:  Limit design choices based on data consolidation from the information gathering exercise. An example could be that you need to use the organisations existing NFS storage solution. A functional requirement may be that the intended workload you need to virtualize is MS Exchange. Currently virtualising MS Exchange on NFS is not supported – if the customer had a requirement to virtualise MS Exchange but only had an NFS-based storage solution, the proposal would lead to an unsupported configuration. Replacing the storage solution may not be feasible and out of scope due to financial reasons.
  • Risks: Put simply are defined by the probability of a threat, the vulnerability of an asset to that threat, and the impact it would have if it occurred. Risks throughout the project must therefore be recorded and mitigated, regardless of which aspect of the project they apply to.  An example risk, a project aimed at a datacentre that doesn’t have enough capacity to meet the anticipated infrastructure requirements. The datacentre facilities team is working on adding additional power but due to planning issues may not be able to meet the expected deadlines set by the customer. This risk would therefore need to be mitigated and documented to minimise / remove the chance of it occurring.
  • Assumptions: The identification or classification of a design feature without validation. For example: In a multi-site proposal the bandwidth requirements for datastore replication is sufficient to support the stated recovery time objectives. If the site link has existing responsibilities how will the inclusion of additional replication traffic affect existing operations? During the design phase you may identify additional assumptions each of which must be documented and validated before proceeding.

 2. Current state, Future state and Gap Analysis:

  • Identifying the current state can be done by conducting an audit of the existing infrastructure, obtaining infrastructure diagrams, system documentation, holding workshops with SME’s and application owners.
  • A future state analysis is performed after the current state analysis and typically outlines where the organization will be at the end of the projects lifecycle.
  • A gap analysis outlines how the project will move from the current state to the future state and more importantly, what is needed by the organization to get there.

3. Conceptual, Logical & Physical Design Process:

  • A conceptual design identifies how the solution is intended to achieve its goals either through text, graphical block diagrams or both.
  • A logical design must focus on the relationships between the infrastructure components – typically this does not contain any vendor names, physical details such as amount of storage or compute capacity available.
  • A physical design shows a detailed description of what solutions have been implemented in order to achieve the project goals. For example: How the intended host design would mitigate against a single point of failure.
  • Get stakeholder approval on design decisions before moving to the implementation phase. Throughout the design process you should continually evaluate design decisions against the goal requirements and the infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security).
    • Availability: Typically concerned with uptime and calculated as a percentage based on the organisations service level agreements (SLA). The key point is mitigating against single point of failure across all components. Your aim is to build resiliency into your design. Availability is calculated as a percentage or 9s value : [Availability % = ((minutes in a year – average annual downtime in minutes) / minutes in a year) × 100].
    • Manageability: Concerned with the operating expenditure of the proposed solution or object. How well will the solution scale, manage, implement, upgrade, patch etc..
    • Performance: How will the system deliver required performance metrics, typically aligned to the organisations KPIs and focus on workload requirements: response times, latency etc..
    • Recoverability: RTO/RPO | MTD : Recovery Time Objective: Time frames associated with service recovery. Recovery Point Objective: How much data loss is acceptable? Maximum Tolerable Downtime: A value derived from the business which defines the total amount of time that a business process can be disrupted without causing any unacceptable consequences.
    • Security: Compliance, access control. How best can you you protect the asset, workload, from intruders or DOS attacks. More importantly what are the consequences/risks of your design decisions.

4. Migration and Implementation Planning:

  • Identify low risk virtualisation targets and proceed with migrating these within the organisation first. This is beneficial in achieving early ROI, build confidence and assist other operational aspects of future workload migrations.
  • Work with application owners to create milestones and migration schedules.
  • Arrange downtime outside of peak operating hours, ensure you have upto date and fully documented rollback and recovery procedures.
  • Do not simply accept and adopt best practises; understand why they are required and their impact on the design.

Additional Guidelines: 

  • Create service dependency mappings: These are used to identify the impact of something unexpected and how best to protect the workload in the event of disaster. DNS for example plays an important role in any infrastructure – if this was provided through MS Active Directory in an all virtualised environment, what impact would the failure of this have on your applications, end users, external customers? How can you best mitigate the risks of this failing?
  • Plan for performance then capacity: If you base your design decisions on capacity you may find that as the infrastructure grows you start experiencing performance related issues.  This is primarily attributed to poor storage design, having insufficient drives to meet the combined workload I/O requirements.
  • Analyse workload performance and include capacity planning percentage to account for growth.
  • What are the types of workloads to be virtualised – Oracle, SQL, Java etc.  Ensure you understand and follow best practices for virtualized environments – reviewing and challenging where appropriate. Oracle for example has very strict guidelines on what they deem as cluster boundaries and can impact your Oracle licensing agreement.
  • Don’t assume something cannot be virtualised due to an assumed issue.
  • Benchmarking applications before they are virtualised can be valuable in determining a configuration issue post virtualisation.
  • When virtualising new applications check with the application vendor regarding any virtualisation recommendations. Be mindful of oversubscribing resources to workloads that won’t necessarily benefit from it. “Right sizing” virtual machines is an important part of your virtualisation project. This can be challenging as application vendors set specific requirements around CPU and memory.
    • For existing applications be aware of oversized virtual machines and adjust resources based on actual usage.
  • What mechanisms will you use to guarantee predicable levels of performance during periods of contention? See vSphere NIOC, SIOC.
  • VARS/Partners may be able to provide the necessary tools to assess current workloads, examples of which include VMware Capacity Planner (can capture performance information for Windows/Linux), IOStat, Windows Perfmon, vscsiStats,  vRealize Operations Manager…

5. Functional Testing / Quality Assurance :

This is a very important part of your design as it allows you to validate your configuration decisions. Also ensuring configurational aspects of the design are implemented as documented. This stage is also used to ensure the design meets both functional and non-fuctional requirements. Essentially the process maps the expected outcome against actual results.

  • Functional Testing is concerned with exercising core component function. For example, can the VM/Workload run on the proposed infrastructure.
  • Non-functional testing is concerned with exercising application functionality using a combination of invalid inputs, some unexpected operating conditions and by some other “out-of-bound” scenarios. This test is designed to evaluate the readiness of a system according to several criteria not covered by functional testing. Test examples include; vSphere HA, FT, vMotion, Performance
, security…

6. Continuous Improvement:

The ITIL framework is aimed at maximising the ability of IT to provide services that are cost effective and meet the expectations and requirements of the organisation and customers. This is therefore supported by streamlining service delivery and supporting processes by developing and documenting repeatable procedures. The ITIL Framework CSI (Continual Service Improvement) provides a simple seven-step process to follow.

Stage 1: Define what you should measure
Stage 2: Define what you currently measure
Stage 3: Gather the data
Stage 4: Processing of the data
Stage 5: Analysis of the data
Stage 6: Presentation of the information
Stage 7: Implementation of corrective action

  • Workloads rarely remain static. The virtualised environment will need constant assessment to ensure service levels are met and KPIs are being achieved. You may have to adjust memory and CPU as application requirements increase or decrease. Monitoring is an important part in the process and can help you identify areas which need attention. Use built-in alarms to identify latency in storage and vCPU ready times, which can be easily set to alert you to an issue.
  • Establish a patching procedure  (Host, vApps, VMs, Appliances, 3rd party extensible devices).
  • Use vSphere Update Manager to upgrade hosts, vmtools, virtual appliances. This goes deeper than just the hypervisor – ensure storage devices, switches, HBA, firmware are kept up-to-date and inline with vendor guidelines.
  • Support proactive performance adjustments and tuning, analyse issues : determine the root cause, plan corrective action, remediate then re-assess.
  • Document troubleshooting procedures.
  • Use automation to reduce operational overheads.
  • Maintain a database of configuration items (these are components that make up the infrastructure), their status, lifecycle, support plan, relationships and which department assumes responsibility for them when something goes wrong.

7. Monitoring Performance Availability and Capacity:

  • Ensure the optimal and cost effective use of the IT infrastructure to meet the current and future business needs. Match resources to workloads that require a specific level of service. Locate business critical workloads on datastores backed by tier 1 replicated volumes on infrastructure that mitigates against single point of failure.
  • Make use of built-in tools for infrastructure monitoring and have a process for managing / monitoring service levels.
  • Monitor not only the virtual machines but the underlying infrastructure, using built-in tools already mentioned above, to monitor latency.
  • Performance and capacity reports should include, hosts / clusters, datastores and resource pools.
  • Monitor and report on usage trends at all levels, compute, storage and networking.
  • Scripts for monitoring environment health (see Alan Renouf’s vCheck script).
  • A comprehensive capacity plan uses information gathered from day-to-day tuning of VMware performance, current demand, modeling and application sizing (future demand).

Additional Service Management Tasks:

  • Integrate the virtual infrastructure into your configuration and change management procedures.
  • Ensure staff are trained to support the infrastructure – investment here is key in ensuring a) staff are not frustrated supporting an environment they don’t understand and b) the business gets the most out of their investment.
  • Develop and schedule maintenance plans to ensure the environment can be updated and is running optimally.
  • Plan and perform daily, weekly, monthly maintenance tasks.  For example, search for unconsolidated snapshots, review VMFS volumes for space in use and available capacity (anything less than 10% available space should be reviewed). Check logical drive space on hosts. Check that any temporary vms can they be turned off or deleted. Monthly maintenance tasks, create a capacity report for the environment and distribute to IT and management. Update your VM templates, review the vmware website for patches, vulnerabilities and bug fixes.

Reference Documentation:

Conceptual Logical Physical It is Simple, by John A Zackman
Leveraging ITIL to Manage Your Virtual Environment, by Laurent Mandorla, Fredrik Hallgårde, BearingPoint, Inc.
Performance Best Practises for VMware vSphere
ITIL v3 Framework, Service Management Guide
Control Objectives for Information and Related Technology (COBIT) framework by ISACA
Oracle Databases on VMware vSphere Best Practise Guide
VMware vSphere Monitoring Performance Guide

VMware vSphere Storage Design Considerations

In my previous post I looked at the calculations required to determine the minimum number of hosts needed to satisfy the compute design. This was achieved through an assessment of the current state analysis, identifying average peak CPU and memory consumption.

A summary of the tools can be found here: VMware vSphere Compute Design … The same tools can be used to determine the VM/Physical server I/O profile, capacity and throughput requirements we need to design and scale an appropriate storage solution.

Getting your storage design right is crucial. A poorly designed SAN can negatively impact the vSphere Infrastructure. Storage like – networking and the compute layer are corner stone areas, that require careful planning and investment. Failures here may impact project delivery, budget, performance, damaging user and stakeholder experience.

This post will look at some of the principles around VMware storage design in general.

Key Decision Points & Considerations

  • Plan for failure, a good storage design should take into account the impact of failure, for example:
    • Site failure (DR), your SAN array may support block level replication, if you don’t have this capability (due to cost or features) look at network/host level replication offered in vSphere 5.1 or other replication tools. Disaster recovery is not just about ensuring you can backup and restore data its about ensuring business continuity.
    • Identify bandwidth replication requirements / what is the rate of change ? (this impacts whether or not you can perform synchronous or a-synchronous replication).
    • Failure of individual components (review this end to end) fabric interconnects, switches, storage processors, drive shelves, host HBA, power etc… the key point here is to find ways for mitigating any risks from an infrastructure point of view.
  • Size and plan according to workload peaks (example factors: backups, month-end reporting)
  • Array availability requirements, n+1, n+2 etc… at minimum your solution should withstand the failure of at least one node (n+1), however be aware of the impact if a storage processor is down for maintenance. During periods of maintenance availability requirements might not be satisfied.
  • Scale the design for current and future IOPs and capacity requirements, total storage capacity is the sum of all current storage usage plus projected growth, IOPs provides the performance the array needs to support the workloads.
  • Do you plan to use advanced technologies such as – deduplication, sub-lun tiering, caching?
    • How will this impact the design, observe SIOC & array vendor best practises regarding the use of sub-lun tiering.
  • Number and speed of drives needed (FC/SAS, SATA/NL, SSD), this has an impact on performance, capacity, availability and budget etc..
Drive Type Unit IOP/s
SSD(SLC) 6,000 +
SSD(MLC) 1,000 +
15K RPM 175-200
10K RPM 125-150
7.2K RPM 50-75
  • Storage Protocol Choices – (FC/FCoE, iSCSI, NFS), the decision is driven by throughput and existing requirements and constraints.
  • Whether service processors will run in an Active-Active, Active-Passive configuration
    • This impacts host path selection policies, whether I/O requests can be balanced across all available paths.
    • Impacts performance, I/O is balanced on a per LUN basis only – having additional ‘Active’ controllers to service requests can improve performance in conjunction with multi-pathing policies..
  • Check array support for the VMware VAAI primitives (VAAI, VAAI-NFS, VASA and by extension Storage I/O control).
    • This offers performance improvements (hardware offloading – hardware assisted copy, locking, block zeroing).
  • Will you thin provision at the LUN or VM level?
    • Thin provisioning has its benefits, but increases the management overhead. Common use case for environments that require ‘x’ amount of space but don’t use all the space allocated.
    • The impact of out of space conditions on VAAI-supported arrays causes VM’s to stun. VM’s can be resumed if VMFS datastore space is increased or reclaimed, alternatively if VM swap files are stored on same datastore power off non-critical VM’s (virtual machine swap files are by default stored in the base VM folder, this can be changed in certain instances e.g : reduce replication bandwidth). Powering off the VM removes the .vswp file (the .vswp file equals memory granted to the VM less any reservations).
    • The common cause for out of space conditions are attributed to poor or non-existent capacity monitoring. This can also be caused by snapshots that have grown out of control.
    • Thin on thin is not recommended, due to operational overhead required to monitor both vmfs datastores and backing LUNs.
  • Set appropriate queue depth values on HBA adapters (use with caution), follow vendor recommendations. Observe impact to consolidation ratios specifically the number of VMs in a VMFS datastore. Setting queue depths too high can have a negative impact on performance.
  • For business critical applications you may want to limit virtual machine disk files to one or two virtual disks per VMFS datastore.
    • Observe the ESXi LUN Maximums (currently 256)
    • In situations where you have multiple VM virtual disks per VMFS datastore, you may want to use Storage I/O control (requires enterprise plus licensing). SIOC is triggered during periods of contention, VMs on datastores use an I/O queue slot relative to the VM’s share values, this ensures that high-priority VMs receive greater throughput than lower-priority ones.
  • Quantify RAID requirement based on availability, capacity & performance requirements (IMO scope for throughput/IOPs first capacity second)
    • Caveat: There is little or no use case for RAID 0.
  • I/O size can have an adverse effect on IOPs, meaning a larger the I/O size the fewer the amount of IOPs the drive can generate.
  • I/O size (KB) multiplied by IOPs = throughput requirement, the larger the I/O size the more it impacts IOPs.
    • A higher number of IOPs might be due to a small I/O size (low throughput) whereas a larger I/O size might equate to a lower number of IOPs, but would be a higher amount of throughput. Understanding throughput requirements is crucial as this may dictate protocol & bandwidth requirements (iSCSI 1Gb/iSCSI 10Gb /FC etc…)
  • Ensure that host HBA cards use are same lane PCIe slots, a lane is composed of two differential signaling pairs: one pair for receiving data, the other for transmitting, its not recommended placing one card in a x4 slot and another x16 slot.
  • Design choices need to be validated against the requirements and constraints, as well as understanding the impact those decisions have on the design. For example, what if through your analysis you have determined that iSCSI is suitable protocol choice. Be aware of the impact to network components – a common strategy is to map this design choice against the infrastructure qualities (availability, manageability, performance, recoverability and security). Do you intend to use software initiators, dependent hardware initiators or independent hardware initiators? Each of these decisions impacts your design. i.e If you intend to use independent hardware initiators, how does this impact iSCSI security?, Do you have enough PCIe ports available in your hosts? Do you plan to use separate iSCSI switches or existing network switches?, Does the existing switches support large payloads sizes above 1500 bytes?, Do you have enough ports?, How will you secure the storage network ? (i.e.: with L2 non-routed VLANs), Will the switches be redundant? Is there available rack space/power etc…
  • Finally, document everything!

Resource Allocation

  • How will the resources, capacity, drive class characteristics (IOPs) be distributed amongst all the workloads?
    • VM-to-Datastore allocation, Application/Infrastructure life cycles – (Production, Test, Dev).
    • See use cases for SIOC: Link
  • Prioritise critical applications on faster class of drives offering better performance / higher availability.
  • It’s generally accepted to distribute intensive workloads across datastores, for example grouping several SQL servers on the same datastore can lead to contention and impact performance.
    • Use SDRS – SDRS can load balancing I/O among datastores within a datastore clusters.
  • Adhere to customer/business PCI-DSS compliance requirements (for example: logically separate datastores/storage domains). VCDX133 – Rene Van Den Bedem: has written a great post on how compliance requirements map to vSphere design decisions: Link.
  • VM/Application availability requirements, ie MS Clustering (do you plan to use RDM’s, if so physical or virtual operating mode?)
    • Beware of the impact of each mode (see my blog post on MS Clustering Design Guidelines).
  • Create single partitions with single VMFS partitions per LUN.
    • Creating multiple VMFS partitions per LUN increases SCSI reservations (impacting VM & virtual disk performance). For every partition created per LUN you increase the chance of metadata locks – this all adds up to increased latency.
  • Factors that determine optimal datastore size:
    • Max tolerable downtime (MTD), RPO-RTO, DR requirements.
  • How will restores be performed?
    • Will you be using disk or tape to perform VM restores?
      • What is the performance of your restore device? understanding this impacts you’re RTO & maximum tolerable downtime.
      • Tape drive transfer rates at 2: 1 compression – : LTO 2 = 173GB/hr, LTO 3 = 432GB/hr, LTO4 = 846GB/hr, LTO5 = 1TB/hr, LTO6 = 1.44TB/hr
  • Calculating VM storage consumption = (VM Disk(s) Size + 100MB Log files) + (.VSWP size – Reservations) + (25% Growth).

Storage Protocol Decisions

iSCSI, NFS, FC, FCoE – Have a look at Cormac Hogans : Storage Protocol Comparisons. Link

vSphere VAAI Storage Primitives – here to help!

  • Provides hardware offload capabilities
    • Full Copy (hosts don’t need to read everything they write), this significantly improves storage vMotion, VM Cloning, template creation.
      • Reduces unnecessary I/O on switches and front-end ports.
  • Block Zeroing, (Write Same $) = faster disk creation times (use case eager-zeroed thick virtual disks).
    • This also reduces the time it takes to create FT enabled VMs.
    • Recommended for high performance workloads.
  • Hardware Assisted Locking, (AT & S) – Excessive SCSI reservations by a host can cause performance degradation on other hosts that are accessing the same VMFS datastore.
    • AT&S improves scalability and access efficiency by avoiding SCSI reservation issues.
  • In addition SCSI/ T10 UNMAP, can reclaim dead-space by informing the storage array when a previously used blocks are longer needed.

Workload I/O Profiles

  • Differing I/O profiles can impact storage design, for example using an IOPs requirements of 20,000 IOPs / RAID 5 with 15K FC/SAS drives (approximately 180 IOPS each).
    • A Read-heavy workload, 90/10 (reads vs writes) could be satisfied with 149 drives.
    • A balanced workload 50/50 (read vs writes) would require 286 drives.
    • A write-heavy workload 10/90 (reads vs writes) would require 423 drives.
  • Remember I/O size correlates to throughput
    • Throughput = Functional Workload IOPs x I/O Size, using an an I/O Size of 8K
    • Functional Workload IOPs (2000 x 8K = 20MB/s) x Number of workloads on host = VM:Host consolidation ratio.
      • To convert MB/s to Megabits per second (iSCSI/NFS) multiply by 8.
      • 20MB/s x 8 = 160 Mbp/s (Note: iSCIS/NFS –  a single network card at full duplex can provide around 800Mbs, so in this scenario the workload requirement is satisfied but only on a single adapter).

Calculating the required number IOPs to satisfy the workload requirements

Use active or passive monitoring tools such as VMware Capacity planner (available to VMware partners only).If you are not a VMware partner check with your VAR(reseller) if they can perhaps help. There are also third-party tools available such as platespin power recon, perfmon, Quest, etc.. Which provide ways of capturing IO statistics.

Key points for assessment:
1. Determine the average peak IOPs per workload (VMware Capacity Planner, Windows Perfmon, iostat).
2. Determine the I/O profile– Reads versus Writes. Check the array or Perfmon/IOStat x number of workloads.
3. Determine throughput requirements, Read Mbps versus Write Mbps. Reads (KB/s) + Writes (KB/s) = Total maximum throughput.
4. Determine RAID type based on availability, capacity & performance requirements (as mentioned before scope for performance first).

As an example the following values will be used to run through a couple of sums –

Number of workloads 100
Average Peak IOPs 59
% Read 74
% Write 26
RAID – Write Penalty 2

Formula:
IO Profile = (Total Unit Workload IOPS × % READ) + ((Total Unit Workload IOPS × % WRITE) × RAID Penalty)

(59 x 75%) + ((59 x 26%) x 2)
(44.25) + ((15.34) x 2)
44.25 + 30.68 = 74.93 (IOPs Required)
Rounded up to 75 IOPs

Therefore, 75 IOPs per VM x Number of VMs you want to virtualise = Total IOPs required

75 IOPs x 100 VMs = 7500 IOPs (note: you may want to add 25% growth depending on customer requirements)
75 IOPs x 125 VMs = 9375 IOPs (with 25% VM growth) – This is the amount of IOPs the Storage Array needs to support.

Calculate the number of drives needed to satisfy the IO requirements
Example values:

IOPs Required 20,000
Read rate KBps 270,000
Write rate KBps 80,000
Total throughput 440,000

Determining read/write percentages

Total throughput = Reads + Writes
270,000 KBps + 80,000 KBps = 350,000 KBps
Total Read% = 270,000 / 350,000 = 77.14
Total Write% = 80,000 / 350,000 = 22.85

Using the derived read/write values we can determine the amount of drives needed to support required workloads.

Number of drives required = ((Total Read IOPs + (Total Write IOPS x RAID Penalty)) / Disk Speed IOPs)

Total IOPS Required = 20,000
Read : 77.14% of 20,000 IOPS = 15428
Write: 22.85% of 20,000 IOPS = 4570

RAID-5, write penalty of 4
Total Number of disks required = ((15428 + (4570 x 4)) /175) = 193 Disks

RAID-1, write penalty of 2
Total Number of disks required = ((15428 + (4570 x 2))/175) = 141 Disks

I hope you found the information useful – Any questions or if you feel some of the information needs clarification let me know.

Recommended Reading

VMware vSphere Storage Guide ESXi 5.5

 

VCAP5-DCD Certification

The VMware Certified Advanced Professional 5 – Data Center Design (VCAP5-DCD) certification is designed for IT architects who design and integrate VMware solutions in multi-site, large enterprise, virtualised environments.

The VCAP-DCD focuses on a deep understanding of the design principles and methodologies behind datacentre virtualisation. This certification relies on your ability to understand and decipher both non-functional and functional-requirements, risks, assumptions and constraints. Before undertaking this, you should have thorough understanding of the following areas :  Availability and security, storage / network design, disaster recovery – and by extension performing business impact analysis / business continuity, dependency mapping, automation and service management.

Key in this process is understanding, how design decisions relating to availability, manageability, performance, recoverability and security impact the design.

I have put together some of the vSphere 5 best practises referenced in the exam blueprint, I hope you find the information helpful if you considering taking this exam. vSphere 5 Design Best Practise Guide.pdf

In preparation for this exam, I found the following books very useful along with the documents provided in the exam blueprint.

  • VMware vSphere 5.1 Clustering Deepdive – Duncan Epping & Frank Denneman
  • VMware vSphere Design 2nd Edition – Scott Lowe & Forbes Guthrie
  • Managing and Optimising VMware vSphere Deployments – Sean Crookston & Harley Stagner
  • Virtualising Microsoft Business Critical Applications on VMware vSphere – Matt Liebowitz & Alex Fontana
  • VCAP-DCD Official Cert Guide – Paul McSharry
  • ITIL v3 Handbook – UK Office of Government & Commerce

Here is the VCAP-DCD certification requirements road map:

VCAP-DCD Certification

As mentioned before, the blueprint is key! – Here is a URL export of all the tools/resources the blueprint targets, this should save time in trawling through the pdf.

Section 1 – Create a vSphere Conceptual Design
Objective 1.1 – Gather and analyze business requirements
VMware Virtualization Case Studies
Five Steps to Determine When to Virtualize Your Servers
Functional vs. Non-Functional Requirements
Conceptual, Logical, Physical:  It is Simple

Objective 1.2 – Gather and analyze application requirements
VMware Cost-Per-Application Calculator
VMware Virtualizing Oracle Kit
VMware Virtualizing Exchange Kit
VMware Virtualizing SQL Kit
VMware Virtualizing SAP Kit
VMware Virtualizing Enterprise Java Kit
Business and Financial Benefits of Virtualization: Customer Benchmarking Study

Objective 1.3 – Determine Risks, Constraints, and Assumptions
Developing Your Virtualization Strategy and Deployment Plan

Section 2 – Create a vSphere Logical Design from an Existing Conceptual Design
Objective 2.1 –Map Business Requirements to the Logical Design
Conceptual, Logical, Physical:  It is Simple
VMware vSphere  Basics Guide
What’s  New  in  VM ware  v Sphere  5 
Functional vs. Non-Functional Requirements
ITIL v3 Introduction and Overview

Objective 2.2 – Map Service Dependencies
Datacenter Operational Excellence Through Automated Application Discovery & Dependency Mapping

Objective 2.3 – Build Availability Requirements into the Logical Desig
Improving Business Continuity with VMware Virtualization Solution Brief
VMware High Availability Deployment Best Practices
vSphere Availability Guide

Objective 2.4 – Build Manageability Requirements into the Logical Design
Optimizing Your VMware Environment
Four Keys to Managing Your VMware Environment
Operational Readiness Assessment
Operational Readiness Assessment Tool

Objective 2.5 – Build Performance Requirements into the Logical Design
Proven Practice: Implementing ITIL v3 Capacity Management in a VMware environment
vSphere Monitoring and Performance Guide

Objective 2.6 – Build Recoverability Requirements into the Logical Design
VMware vCenter™  Site Recovery Manager Evaluation Guide
A Practical Guide to Business Continuity and Disaster Recovery with VMware Infrastructure
Mastering Disaster Recovery: Business Continuity and Disaster Recovery Whitepaper
Designing Backup Solutions for VMware vSphere

Objective 2.7 – Build Security Requirements into the Logical Design
vSphere Security Guide
Developing Your Virtualization Strategy and Deployment Plan
Achieving Compliance in a Virtualized Environment
Infrastructure Security:  Getting to the Bottom of Compliance in the Cloud
Securing the Cloud

Section 3 – Create a vSphere Physical Design from an Existing Logical Design
Objective 3.1 – Transition from a Logical Design to a vSphere 5 Physical Design
Conceptual, Logical, Physical:  It is Simple
vSphere Server and Host Management Guide
vSphere Virtual Machine Administration Guide

Objective 3.2 – Create a vSphere 5 Physical Network Design from an Existing Logical Design
vSphere Server and Host Management Guide
vSphere Installation and Setup Guide
vMotion Architecture, Performance and Best Practices in VMware vSphere 5
VMware are  vSphere™: Deployment Methods for the VMware® vNetwork Distributed Switch
vNetwork Distributed Switch: Migration and Configuration
Guidelines for Implementing VMware vSphere with the Cisco Nexus 1000V Virtual Switch
VMware® Network I/O Control: Architecture, Performance and Best Practices

Objective 3.3 – Create a vSphere 5 Physical Storage Design from an Existing Logical Design
Fibre Channel SAN Configuration Guide
iSCSI SAN Configuration Guide
vSphere Installation and Setup Guide
Performance Implications of Storage I/O Control–Enabled NFS Datastores in VMware vSphere® 5.0
Managing Performance Variance of Applications Using Storage I/O Control
VMware Virtual Machine File System: Technical Overview and Best Practices

Objective 3.4 – Determine Appropriate Compute Resources for a vSphere 5 Physical Design
vSphere Server and Host Management Guide
vSphere Installation and Setup Guide
vSphere Resource Management Guide

Objective 3.5 – Determine Virtual Machine Configuration for a vSphere 5 Physical Design
vSphere Server and Host Management Guide
Virtual Machine Administration Guide
Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs
Virtualizing a Windows Active Directory Domain Infrastructure
Guest Operating System Installation Guide

Objective 3.6 – Determine Data Center Management Options for a vSphere 5 Physical Design
vSphere Monitoring and Performance Guide
vCenter Server and Host Management Guide
VMware vCenter Update Manager 5.0 Performance and Best Practices

Section 4 – Implementation Planning
Objective 4.1 – Create an Execute a Validation Plan
vSphere Server and Host Management Guide
Validation Test Plan

Objective 4.2 – Create an Implementation Plan
vSphere Server and Host Management Guide
Operational Test Requirement Cases

Objective 4.3 – Create an Installation Guide
vSphere Server and Host Management Guide