Autoscaling at the Application Abstraction Layer | Horizontal Autoscaling | Vertical Autoscaling | Autoscaling at the Infrastructure Layer | Cluster Autoscaling | Best Practices for Kubernetes Autoscaling | Additional Cost Benefits of Scaling
Kubernetes adoption has skyrocketed during the last few years. No matter which statistics you look at, you’re bound to see high rates of adoption. Sysdig’s 2019 Container Usage Report states that 77 percent of containers are now run on Kubernetes. That figure doesn’t include Red Hat OpenShift and Rancher, which are also built on Kubernetes — they take the figure up to 89 percent.
So what’s the secret? Why do organizations, big and small, embrace Kubernetes so willingly? Kubernetes comes with many benefits out of the box, agility, cost-efficiency, reliability, ease of deployment, and the most important – scalability. The best thing about it is that you can automate most of the scaling tasks. Scaling your Kubernetes will also yield considerable cost benefits.
Kubernetes provides various options for autoscaling containerized applications at both the application abstraction layer, as well as the infrastructure layer. Let’s get right into it.
Kubernetes supports two types of autoscaling – horizontal and vertical — to scale the resources available for your containers:
Horizontal autoscaling is performed by the Horizontal Pod Autoscaler (HPA), which is implemented as a Kubernetes API resource and a controller. The HPA automatically scales the number of pods in a replication controller or deployment, based on CPU utilization. It can also be done based on application-based criteria, using custom metrics support.
The number of pods directly affects your cloud cost because the number of nodes you require for your application depends on it. We’ll look at how scaling affects cloud spending later in this article.
The controller assesses the metrics associated with it periodically and adjusts the number of replicas in the replication controller or deployment to match the target specified by the administrator. The default period is 15 seconds, but it can be managed via the control managers — horizontal-pod-autoscaler-sync-period flag.
The HPA is implemented as a control loop, so, during each period, the controller manager queries the relevant metrics against the target metrics defined in the HorizontalPodAutoscaler definition and decides on the resulting action. Metrics are obtained from the resource metrics API for pod-related metrics, like CPU utilization or the custom metrics API for everything else.
The HPA usually waits 3 minutes after scaling up to allow metrics to stabilize. It also waits 5 minutes after scaling down in order to avoid autoscaler thrashing — defined as unnecessarily scaling resources due to frequent fluctuations in metrics.
Vertical autoscaling is another way of scaling your applications based on usage and other criteria. The Vertical Pod Autoscaler (VPA) is relatively new, and scales the amount of resources like CPU and memory available, to existing pods. This is done on a periodic basis with a default period of 10 seconds.
VPA mainly caters to stateful services and Out of Memory (OOM) events. However, it can also be used with stateless services for specialized tasks like auto-correcting the initial resource allocations of your pods. We’ll also be looking at the pricing impact of not using the right-sized nodes, as it can be considerable with larger applications.
One thing to remember is that VPA requires pods to be restarted for resource scaling to take effect. Having said this, it isn’t something you need to worry about because the VPA will respect the Pods Distribution Budget (PDB) and ensure that the minimum number of required pods is satisfied.
VPA allows you to define upper and lower limits for resource scaling so that the resource limits of your nodes are never exceeded. The VPA Recommender feature is probably the most useful out of all the benefits offered by VPA. It analyzes historical resource usage and other metrics, and provides recommendations regarding optimum resource values.
Pod autoscaling can take you a long way in ensuring high performance and cost efficiency. However, in some instances, apps with extensive usage may require more nodes. You may look at increasing the number of clusters or nodes to resolve such situations. There are many benefits to maintaining your applications in a single cluster. So it makes sense to scale the number of nodes within your cluster.
The Cluster Autoscaler (CA) automatically manages the number of nodes within a cluster, based on pending pods. Pods are kept in a pending stage when there is a resource shortage within a cluster. The CA assesses the number of nodes required, and as of Kubernetes version 1.18, it automatically interfaces with many cloud providers like AWS, Azure, and Google. Some services like AWS offer comprehensive CA features to optimize cost benefits.
Once new nodes are made available by the provider, the Kubernetes scheduler allocates the pending pods to them. The process is then repeated if there are any more pods in the pending state.
The CA identifies pending pods within 30 seconds. However, once more nodes are added, it waits for 10 minutes, after nodes are no longer needed, before scaling down. You should use cluster autoscaling wisely, as it can considerably reduce the flexibility to scale down if it is turned on for too many pods.
We looked at three separate options for autoscaling. However, it is important to note that you can’t simply turn all three of them on and expect everything to work perfectly. So let’s look at some of the best practices when it comes to autoscaling.
In addition to the autoscaling benefits offered natively by Kubernetes, you can also use vendor-specific features to increase savings. AWS is one of the most recommended providers of managed Kubernetes. AWS Elastic Kubernetes Service (EKS) supports all the autoscaling methods discussed in this article.
A case study recently published by AWS explains how to enjoy up to 80 percent cost savings through autoscaling, adjusting node resource limits, switching to reserved instances for up to 70 percent of nodes, and scheduling off-peak downscaling. Since a typical large-scale application that requires 100 CPU Cores and 400GB of memory, can cost about $45,000 per year (with m4.large EC2 instances being billed at 10 cents per hour and cluster management at 20 cents per cluster, per hour), the ability to save even half of this, let alone 80 percent, is a significant advantage.
Google Kubernetes Engine (GKE) offers slightly lower pricing than AWS, with typical computing instances like n1-standard-2 being offered at 9.5 cents per hour. Also, Google will charge for cluster management, effective June 1, at 20 cents per hour, per zonal cluster. This brings the yearly cost on GKE to about $43,000.
Azure Kubernetes Service (AKS) pricing does not differ much. It offers savings by not charging for cluster management, but charges a higher rate for worker nodes at 11 cents per hour. This fee totals $48,000 a year.
While the top three managed Kubernetes providers offer quite competitive rates, AWS EKS offers the most benefits through its many scaling related features. It commands 77 percent of the Kubernetes market, even though it is not the cheapest out of the lot, as a result of the benefits it offers via scaling.
We looked at three solutions for autoscaling and the best practices to optimize their use. These autoscaling options can work for you irrespective of the size of your application and Kubernetes infrastructure. But using them correctly will definitely take you from Startup to Superstar!
Ready to take advantage of autoscaling? Contact us today.