The ITIL 4 capacity and performance management practice

The purpose of the capacity and performance management practice is to ensure that services achieve agreed and expected performance, satisfying current and future demand in a cost-effective way.

When business services are running on a cloud such as AWS, Azure, or Google, it refers to the capacity and performance of the individual cloud services used to run an application and provide a service – not the cloud service provider (CSP) as a whole.

Capacity and performance management activities thus include:

  • Researching and monitoring current service performance
  • Capacity and performance modeling
  • Capacity requirements analysis
  • Demand forecasting and resource planning
  • Performance improvement planning.

The great news for IT service managers who are using ITIL 4 is that one of the five essential cloud characteristics – elasticity – transforms capacity and performance management for the better over non-cloud systems.

How capacity management works in the cloud

Cloud elasticity and the illusion of infinite resources changes the approach to capacity and performance management. Cloud reduces, but doesn’t eliminate, the burden of upfront modeling and it fundamentally moves an organization from making “upfront, multi-year capital expenditure big bets” to a “just-in-time, pay for what you use” model. And capacity modeling moves from “design for the theoretical future peak” to “design for actual use.”

AWS, for example, has Performance Efficiency as one of their five pillars in their very popular Well-Architected Framework.

Cloud transforms capacity and performance management in the following ways: 

  1. It democratizes advanced technologies – what was once the preserve of big-budget IT organizations is now available to one-person startups. An example is Serverless DynamoDB which means that anyone and everyone doesn’t have to know about services, or about databases, and only pays for the transactions they process.
  2. You can scale in minutes – if you have a sudden demand peak, easily add capacity then turn it off when demand lowers. But beware of limits enforced by the cloud service provider.
  3. You can get out of the infrastructure game – use cloud managed services instead of the non-cloud practice of “rolling your own.” Move towards modern “serverless” technology that more closely aligns your costs to transactions and reduces capacity planning effort.
  4. You can experiment often – test new features and the load-bearing capabilities by duplicating whole environments, testing them, then turning them off. It’s a vast improvement for modeling.
  5. You get “mechanical sympathy” – systems that gracefully handle hardware failures and avoid hidden performance bottlenecks – and you are able to use the right tool for the job. Leading clouds have multiple managed database types available that serve different needs.

As in all things cloud, the key to success is adjusting your organizations thinking – for instance, how to apply ITIL 4 – to exploit cloud characteristics. This means avoiding the application of non-cloud practices to cloud. 

As an example, reduce the time, effort, and money on upfront theoretical capacity modeling approaches. These are no longer needed because you’ll pay for what you use, not what you think you’ll need. Capacity and performance management in the cloud is less about predicting constraints and more about estimating future cost budgets and identifying optimizations.

Cloud dos and don’ts for capacity and performance management

To exploit the cloud’s inherent elasticity in improving your capacity and performance management practices, please consider the following dos and don’ts:

Do:

  • Manage less infrastructure. Choose the right compute model – break down server instances into smaller, scale-out units spread across availability zones. Consider the use of containers for even more elasticity. Also, consider serverless, but this requires deeper application rearchitecting to “functions-as-a-service.” (LINK)
  • Manage less storage. Use the cloud’s automated storage policies to shift hot and cold data around to match costs to access.
  • Manage fewer databases. Use higher-order managed cloud services like AWS Relational Database Service (RDS) instead of “rolling your own” software like installing, maintaining, and managing your own databases in self-managed compute instances. The more you manage, the more you have to capacity and performance manage.
  • Make cloud consumption visible. Assign names to resources. Show costs and make people accountable.

Don’t:

  • Leave things turned on that aren’t used. In the cloud, capacity and performance’s focus moves toward cost optimization. Instead of “What performance can we get for our capacity and for our set budget” the question becomes “How do we keep downward pressure on costs by matching capacity to demand just-in-time.” Turning things off is one technique.
  • Over-believe and over-commit to upfront total cost of ownership (TCO) models such as those built by the Simple Calculator. Only use them for best-guess estimates. 
  • Miss out on the elasticity that can get you out of a bad capacity and performance situation. If your application can’t scale out or leverage elastic services, then it will be as brittle as it was off the cloud.
  • Make capacity and performance management the responsibility of one team. Make it the responsibility of everyone who uses the cloud, to use it responsibly. If they can’t, enforce constraints with things like AWS Service Catalog.

Capacity and performance management is also tied to the AWS Cost-Optimization Pillar which talks about demand-, buffer-, and time-based approaches to the management of capacity aligned with costs.

Applying your organization’s ITIL 4 capacity and performance management practice to your organization’s use of the cloud should be a successful experience – but it requires a shift in thinking and application. 

The cloud has the illusion of infinite capacity, though per-account limits apply, which means performance is less about “hitting the ceiling” and more about balancing costs and performance. 

So that’s my view on ITIL 4 Foundation’s guidance on the capacity and performance management practice in a cloud context – what would you add? Please let me know in the comments.