Horizontal Pod Autoscaling in Kubernetes

How Does Horizontal Scaling Work In Kubernetes?

June 06, 2022 05:00 PM

Horizontal Pod autoscaling

Autoscaling is among the main attributes of Kubernetes. Autoscaling is one of the most prominent features in the Kubernetes cluster. If properly configured it will save administrators' time, avoid bottlenecks with performance and prevents the expense of financial waste. It's a feature in which the cluster can expand capacity if the need for service response grows, and reduce its number when the need decreases.

One way in which Kubernetes allows autoscaling is via Horizontal Autoscaling of Pods. HPA helps applications scale out to meet increasing demand or to scale up when resources are not required. This kind of autoscaling does not work for objects which cannot be scaled.

In this post, we'll go deep into the subject of autoscaling Horizontal pods within Kubernetes. We'll explain HPA and describe what it does, then offer an extensive guide on how to set up HPA. However, before we do that, let's understand what exactly is Kubernetes. Without further delay, we'll get going!

What exactly is Kubernetes?

Kubernetes is an open-source container-management tool which automates the deployment of containers as well as container scaling and load balance. It schedules, runs and manages containers that are isolated on physical, virtual, as well as cloud machines.

Kubernetes Horizontal Pod Autoscaling (HPA) :

It is the Kubernetes Horizontal Pod Autoscaling that automatically increases how many pods that are in the replication controller's deployment, replica, or set according to the resource's CPU usage.

Kubernetes can automatically scale pods following the CPU utilization that is observed, which is known as auto scaling horizontally for pods. Scaling is only possible for scalable objects, like deployment, controllers and replica sets. HPA functions as a Kubernetes application Programming Interface (API) resource and controller.

By using the controller one can alter the frequency of replicas within a deployment or replication controller so that it matches the average CPU usage according to the goal set by the client.

What exactly is Horizontal PodAutoscaler?

In simpler terms, HPA works in a "check, update, check again' type loop. This is how the steps within this loop operate:

1. Horizontal Pod Autoscaler keeps monitoring the metrics server to monitor resource usage.

2. HPA determines the needed number of replicas on basis of the resource usage data collected.

3. After that, HPA decides to scale up the application to the number of replicas needed.

4. Following that, HPA will change the desired number of replicas.

5. Since HPA monitors regularly and the process is repeated starting from Step 1.

Configuring Horizontal Pod Autoscaling

Let's create a simple deployment:-

Now, create autoscaling

  • kubectl autoscale deployment deploy -CPU-percent=20 -min=1 -max=10
  • Let's look at HPA entries.
  • kubectl get hPa
  • In Kubernetes, a HorizontalPodAutoscaler updates a workload resource (such as a Deployment or StatefulSet), intending to automatically scale the workload to match demand.
  • Horizontal scaling implies that the response to load increases is to increase the number of pods. This differs from vertical pod Autoscaler (VPA) scaling, which in Kubernetes will mean assigning more resources (for instance, processor or memory) for the pods already in operation for the task at hand.
  • If the load decreases and the number of Pods are above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet, or other similar resources) to scale back down.
  • Autoscaling of horizontal pods does not have any effect on objects which aren't able to be scaled (for instance: DaemonSet. DaemonSet.)
  • The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a controller. The resource is responsible for the behaviour of the controller. The autoscaling horizontal pod controller, which runs within the Kubernetes control plane constantly adjusts the desired size of the target (for instance, a deployment) to be in line with measured metrics like the average CPU utilization, the average memory utilization or any other custom-made measurement you have specified.
  • There is an example walkthrough of horizontal pods that autoscaling.

How does a HorizontalPodAutoscaler work?

Horizontal Pod Autoscaler controls the scale of a Deployment and its ReplicaSet.

Kubernetes utilizes autoscaling of horizontal pods as an internal control loop which is running sporadically (it cannot be a continuous procedure). The interval is set by the horizontal-pod-autoscaler-sync-period parameter to the Kube-controller-manager (and the default interval is 15 seconds).

Once during each period, the controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition.

The controller manager locates the resource that is defined as the target by the scaleTargetRef. Then, it chooses the appropriate pods based on the resource's spec.selector labels. It then gets the metrics from either the API for resource metrics (for specific per-pod metrics for resources) as well as the API for customizing metrics (for the different metrics).

For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. If an appropriate target utilization is set then the controller calculates that utilization rate as a percentage of similar demand for resources for the containers within each Pod. If the target value for raw is specified those raw metrics are utilized directly. The controller then calculates the mean of utilization or raw value (depending on the kind of target set) across all the Pods that are targeted and calculates the ratio that will scale the number of replicas desired.

Note that if one of the containers in the Pod doesn't have the appropriate resource requests set, the CPU use for Pod is not calculated and the autoscale won't perform any actions to improve that metric. Please refer to the algorithm's details section to learn more about how auto scaling algorithms work.

For custom metrics per pod, The controller works similarly to resource metrics per per-pods however, it operates with raw values and not utilisation metrics.

For external and object metrics, only one metric is found, that is the metric that describes the object. This metric is compared with the value of the object, to create a ratio like the above. With the API autoscaling version, the value may be subdivided by the total number of Pods before the calculation is done.

The common use for HorizontalPodAutoscaler is to configure it to fetch metrics from aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, or external.metrics.k8s.io). It is the metrics.k8s.io API is typically offered through an add-on called Metrics Server, which needs to be installed independently. For more details on the metrics of a resource, check out Metrics Server. Metrics API Support provides stability assurances and the status of support for these APIs.

The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments and StatefulSet). Each of these resources comes with an associated subresource called scale which is an interface that permits users to set dynamically how many replicas you have you have and look at their respective state of affairs. For more general information about sub resources of Kubernetes API, refer to the Kubernetes API, see Kubernetes API Concepts.

Details about the algorithm

From the most basic perspective, the HorizontalPodAutoscaler controller operates on the ratio between desired metric value and the current metric value:

  • In this case, for example, if the current value for metric is 200m but the value you want is 100m, then the number of copies will increase by two by the ratio 200.0 / 100.0 = 2.0. 200.0 100.0 / 100.0 equals 2.0 If the value currently is instead 50m, then you'll reduce the number of replicas by half as 50.0 5.0.0 / 100.0 is equal to 0.5. The control plane skips any scaling action if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, 0.1 by default).
  • When a target average value or targetAverageUtilization is specified, the current metric value is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler scale target.
  • Before examining the tolerance and making the decision on the values. The control plane examines the presence of any metrics and also how many Pods are prepared. All Pods with a deletion date set (objects with a deletion date are currently in the process of being removed or shut down) are removed, and unsuccessful Pods are eliminated.
  • If a specific Pod is lacking metrics and is not being used, it will be saved to be used later on; Pods with no metrics are utilized to alter the final amount of scaling.
  • When you scale on CPU, if a pod isn't fully operational (it's still in the process of initializing or could be in a state of health) and the last metric point for that pod was before it was fully operational, the pod will be put aside too.
  • Due to technical constraints, the HorizontalPodAutoscaler controller cannot exactly determine the first time a pod becomes ready when determining whether to set aside certain CPU metrics. Instead, it will consider the Pod "not quite ready" if it's not ready and has been reverted to unready within a limited, adjustable time frame from the time it began. This value is configured with the --horizontal-pod-autoscaler-initial-readiness-delay flag, and its default is 30 seconds. When a pod is ready, it will consider any change as the first transition that occurred in an extended, adjustable time from the time it began. This value is configured with the --horizontal-pod-autoscaler-CPU-initialization-period flag, and its default is 5 minutes.
  • The basis scale ratio is determined by using the remaining pods that were not put aside or removed from the above.
  • If there are any missing indicators The control plane then recalculates the average more conservatively as if the pods were taking all of the intended value in the event of a scale-down and at 0% in the event of a scale-up. This reduces the impact of any possible scale.
  • Additionally, if no not-yet-ready pods were found and the workload would have increased, without taking in any missing metrics or non-ready pods. In this case, the control assumes that the not-ready pods consume less than the required measure, further reducing the impact of scaling up.
  • After incorporating the not-yet-ready pods, and missing metrics the controller calculates a new utilization ratio. If the new ratio is in the direction of scale or falls within the acceptable range the controller isn't taking any scaling actions. In other situations, the new ratio will be used to make any adjustments in the number of Pods.
  • Note that the original value for the average utilization is reported back via the HorizontalPodAutoscaler status, without factoring in the not-yet-ready pods or missing metrics, even when the new usage ratio is used.
  • If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If one of these metrics is not transformed to a desired number of replicas (e.g. because of an error in fetching the metrics using the metrics APIs) and a scaling down is suggested by the metrics that are fetched, the scaling is not considered. This implies that the HPA can still be in a position to scale up when one or more metrics indicate the desired Replicas are higher than the present value.
  • Then, just before HPA can scale the target, the scaling suggestion is recorded. The controller takes into consideration every recommendation within a window that can be configured and chooses the best recommendation from the window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization flag, which defaults to 5 minutes. This implies that the scaledowns take place gradually, smoothing the effects of fluctuating the metric values.
  • API Object
  • Horizontal Pod Autoscaling in Kubernetes is an API-based resource that is part of the Kubernetes autoscaling API group. The latest stable version is found within the API autoscaling/v2 version that provides support for scaling memory as well as custom-made metrics. The new fields that were introduced in autoscaling/v2 will be preserved as annotations in autoscaling/v1.
  • When you create a HorizontalPodAutoscaler API object, make sure the name specified is a valid DNS subdomain name. More details about the API object can be found at HorizontalPodAutoscaler Object.
  • Stability of the workload scale
  • When managing the scale of a group of replicas using the HorizontalPodAutoscaler, the number of replicas may keep fluctuating frequently due to the dynamic nature of the metrics evaluated. This can be described as flapping, thrashing or thrashing. It's similar in concept to the notion of hysteresis in cybernetics.
  • Autoscaling during a rolling update
  • Kubernetes allows you to perform the rolling update of a deployment. In this scenario, the Deployment will manage the ReplicaSets that are underlying for you. When you configure autoscaling for a Deployment, you bind a HorizontalPodAutoscaler to a single Deployment. The HorizontalPodAutoscaler manages the replicas field of the Deployment. It is accountable for creating replicas for the ReplicaSets that are the underlying ReplicaSets to make up a suitable amount during the rollout and then later on.
  • If you make a continuous update on a StatefulSet which has an auto-scaling number of replicas it is because the StatefulSet is directly managing its Pods (there is no intermediary resource that is similar to ReplicaSet).
  • Support for metrics for resource use Support for resource metrics.
  • Any HPA target can be scaled according to the resources used by the pods that are part of the target for scaling. When you define the pod specification, the requests for resources like CPU and memory must be defined. This is used to calculate the utilization of resources and is it is used to allow the HPA controller to determine the resource utilization.

HPA controller to adjust the goal up or down. If you want to use resource utilization-based scaling, provide an appropriate metric source such as: With this metric, the HPA controller will maintain the average utilization of pods that are part of the scaling target at 60 per cent. Utilization refers to the proportion of the present usage of resources and the resources requested within the pod. Check out the Algorithm for more details on how utilization is calculated and averaged.


If you change the name of a container that a HorizontalPodAutoscaler is tracking, you can make that change in a specific order to ensure scaling remains available and effective whilst the change is being applied. When you update the resource that is the basis for your container (such as deployment) it is recommended to upgrade the HPA to keep track of both the old and new containers' names.

This is how the HPA can determine a scale recommendation during the process of updating. Once you've rolled out the new name for the container to the resource for work, clean it up by removing the old name of the container in the HPA specification. HPA specification.

Final thoughts

We hope that this article has helped you understand the way Kubernetes horizontal Pod autoscaling operates in addition to how it is set up. HPA lets you expand your applications according to various metrics. By dynamically adjusting the number of pods using HPA, you can use HPA in a highly efficient and cost-effective way.

If you need assistance with the operation of Horizontal Pod Autoscaling or want to know more about the concept. Contact an established and trusted software development firm. Experts and developers will help you navigate the process and provide you with a greater understanding of the idea.

Contact Image

tell us about your project


4 + 9

Message Image

Stop wasting time and money on digital solution Let's talk with us

Contact US!

India india

Plot No- 309-310, Phase IV, Udyog Vihar, Sector 18, Gurugram, Haryana 122022



1968 S. Coast Hwy, Laguna Beach, CA 92651, United States


Singapore singapore

10 Anson Road, #33-01, International Plaza, Singapore, Singapore 079903