Autoscaling is among the main attributes of Kubernetes. Autoscaling is one of the most prominent
features in the Kubernetes cluster. If properly configured it will save administrators' time,
avoid bottlenecks with performance and prevents the expense of financial waste. It's a feature
in which the cluster can expand capacity if the need for service response grows, and reduce its
number when the need decreases.
One way in which Kubernetes allows autoscaling is via Horizontal Autoscaling of Pods. HPA helps
applications scale out to meet increasing demand or to scale up when resources are not required.
This kind of autoscaling does not work for objects which cannot be scaled.
In this post, we'll go deep into the subject of autoscaling Horizontal pods within Kubernetes.
We'll explain HPA and describe what it does, then offer an extensive guide on how to set up HPA.
However, before we do that, let's understand what exactly is Kubernetes. Without further delay,
we'll get going!
What exactly is Kubernetes?
Kubernetes is an open-source container-management tool which automates the deployment of
containers as well as container scaling and load balance. It schedules, runs and manages
containers that are isolated on physical, virtual, as well as cloud machines.
Kubernetes Horizontal Pod Autoscaling (HPA) :
It is the Kubernetes Horizontal Pod Autoscaling that automatically increases how many pods that
are in the replication controller's deployment, replica, or set according to the resource's CPU
Kubernetes can automatically scale pods following the CPU utilization that is observed, which is
known as auto scaling horizontally for pods. Scaling is only possible for scalable objects, like
deployment, controllers and replica sets. HPA functions as a Kubernetes application Programming
Interface (API) resource and controller.
By using the controller one can alter the frequency of replicas within a deployment or
replication controller so that it matches the average CPU usage according to the goal set by the
What exactly is Horizontal PodAutoscaler?
In simpler terms, HPA works in a "check, update, check again' type loop. This is how the steps
within this loop operate:
1. Horizontal Pod Autoscaler keeps monitoring the metrics server to monitor resource usage.
2. HPA determines the needed number of replicas on basis of the resource usage data collected.
3. After that, HPA decides to scale up the application to the number of replicas needed.
4. Following that, HPA will change the desired number of replicas.
5. Since HPA monitors regularly and the process is repeated starting from Step 1.
Configuring Horizontal Pod Autoscaling
Let's create a simple deployment:-
Now, create autoscaling
- kubectl autoscale deployment deploy -CPU-percent=20 -min=1 -max=10
- Let's look at HPA entries.
- kubectl get hPa
- In Kubernetes, a HorizontalPodAutoscaler updates a workload resource (such as a
Deployment or StatefulSet), intending to automatically scale the workload to match
- Horizontal scaling implies that the response to load increases is to increase the number
of pods. This differs from vertical pod Autoscaler (VPA) scaling, which in Kubernetes
will mean assigning more resources (for instance, processor or memory) for the pods
already in operation for the task at hand.
- If the load decreases and the number of Pods are above the configured minimum, the
HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet, or
other similar resources) to scale back down.
- Autoscaling of horizontal pods does not have any effect on objects which aren't able to
be scaled (for instance: DaemonSet. DaemonSet.)
- The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a controller.
The resource is responsible for the behaviour of the controller. The autoscaling
horizontal pod controller, which runs within the Kubernetes control plane constantly
adjusts the desired size of the target (for instance, a deployment) to be in line with
measured metrics like the average CPU utilization, the average memory utilization or any
other custom-made measurement you have specified.
- There is an example walkthrough of horizontal pods that autoscaling.
How does a HorizontalPodAutoscaler work?
Horizontal Pod Autoscaler controls the scale of a Deployment and its ReplicaSet.
Kubernetes utilizes autoscaling of horizontal pods as an internal control loop which is
running sporadically (it cannot be a continuous procedure). The interval is set by the
horizontal-pod-autoscaler-sync-period parameter to the Kube-controller-manager (and the
default interval is 15 seconds).
Once during each period, the controller manager queries the resource utilization against
the metrics specified in each HorizontalPodAutoscaler definition.
The controller manager locates the resource that is defined as the target by the
scaleTargetRef. Then, it chooses the appropriate pods based on the resource's
spec.selector labels. It then gets the metrics from either the API for resource metrics
(for specific per-pod metrics for resources) as well as the API for customizing metrics
(for the different metrics).
For per-pod resource metrics (like CPU), the controller fetches the metrics from the
resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. If an
appropriate target utilization is set then the controller calculates that utilization
rate as a percentage of similar demand for resources for the containers within each Pod.
If the target value for raw is specified those raw metrics are utilized directly. The
controller then calculates the mean of utilization or raw value (depending on the kind
of target set) across all the Pods that are targeted and calculates the ratio that will
scale the number of replicas desired.
Note that if one of the containers in the Pod doesn't have the appropriate resource
requests set, the CPU use for Pod is not calculated and the autoscale won't perform any
actions to improve that metric. Please refer to the algorithm's details section to learn
more about how auto scaling algorithms work.
For custom metrics per pod, The controller works similarly to resource metrics per
per-pods however, it operates with raw values and not utilisation metrics.
For external and object metrics, only one metric is found, that is the metric that
describes the object. This metric is compared with the value of the object, to create a
ratio like the above. With the API autoscaling version, the value may be subdivided by
the total number of Pods before the calculation is done.
The common use for HorizontalPodAutoscaler is to configure it to fetch metrics from
aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, or external.metrics.k8s.io). It
is the metrics.k8s.io API is typically offered through an add-on called Metrics Server,
which needs to be installed independently. For more details on the metrics of a
resource, check out Metrics Server.
Metrics API Support provides stability assurances and the status of support for these
The HorizontalPodAutoscaler controller accesses corresponding workload resources that
support scaling (such as Deployments and StatefulSet). Each of these resources comes
with an associated subresource called scale which is an interface that permits users to
set dynamically how many replicas you have you have and look at their respective state
of affairs. For more general information about sub resources of Kubernetes API, refer to
the Kubernetes API, see Kubernetes API Concepts.
Details about the algorithm
From the most basic perspective, the HorizontalPodAutoscaler controller operates on the
ratio between desired metric value and the current metric value:
- In this case, for example, if the current value for metric is 200m but the value you want
is 100m, then the number of copies will increase by two by the ratio 200.0 / 100.0 =
2.0. 200.0 100.0 / 100.0 equals 2.0 If the value currently is instead 50m, then you'll
reduce the number of replicas by half as 50.0 5.0.0 / 100.0 is equal to 0.5. The control
plane skips any scaling action if the ratio is sufficiently close to 1.0 (within a
globally-configurable tolerance, 0.1 by default).
- When a target average value or targetAverageUtilization is specified, the current metric
value is computed by taking the average of the given metric across all Pods in the
HorizontalPodAutoscaler scale target.
- Before examining the tolerance and making the decision on the values. The control plane
examines the presence of any metrics and also how many Pods are prepared. All Pods with
a deletion date set (objects with a deletion date are currently in the process of being
removed or shut down) are removed, and unsuccessful Pods are eliminated.
- If a specific Pod is lacking metrics and is not being used, it will be saved to be used
later on; Pods with no metrics are utilized to alter the final amount of scaling.
- When you scale on CPU, if a pod isn't fully operational (it's still in the process of
initializing or could be in a state of health) and the last metric point for that pod
was before it was fully operational, the pod will be put aside too.
- Due to technical constraints, the HorizontalPodAutoscaler controller cannot exactly
determine the first time a pod becomes ready when determining whether to set aside
certain CPU metrics. Instead, it will consider the Pod "not quite ready" if it's not
ready and has been reverted to unready within a limited, adjustable time frame from the
time it began. This value is configured with the
--horizontal-pod-autoscaler-initial-readiness-delay flag, and its default is 30 seconds.
When a pod is ready, it will consider any change as the first transition that occurred
in an extended, adjustable time from the time it began. This value is configured with
the --horizontal-pod-autoscaler-CPU-initialization-period flag, and its default is 5
- The basis scale ratio is determined by using the remaining pods that were not put aside
or removed from the above.
- If there are any missing indicators The control plane then recalculates the average more
conservatively as if the pods were taking all of the intended value in the event of a
scale-down and at 0% in the event of a scale-up. This reduces the impact of any possible
- Additionally, if no not-yet-ready pods were found and the workload would have increased,
without taking in any missing metrics or non-ready pods. In this case, the control
assumes that the not-ready pods consume less than the required measure, further reducing
the impact of scaling up.
- After incorporating the not-yet-ready pods, and missing metrics the controller calculates
a new utilization ratio. If the new ratio is in the direction of scale or falls within
the acceptable range the controller isn't taking any scaling actions. In other
situations, the new ratio will be used to make any adjustments in the number of Pods.
- Note that the original value for the average utilization is reported back via the
HorizontalPodAutoscaler status, without factoring in the not-yet-ready pods or missing
metrics, even when the new usage ratio is used.
- If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done
for each metric, and then the largest of the desired replica counts is chosen. If one of
these metrics is not transformed to a desired number of replicas (e.g. because of an
error in fetching the metrics using the metrics APIs) and a scaling down is suggested by
the metrics that are fetched, the scaling is not considered. This implies that the HPA
can still be in a position to scale up when one or more metrics indicate the desired
Replicas are higher than the present value.
- Then, just before HPA can scale the target, the scaling suggestion is recorded. The
controller takes into consideration every recommendation within a window that can be
configured and chooses the best recommendation from the window. This value can be
configured using the --horizontal-pod-autoscaler-downscale-stabilization flag, which
defaults to 5 minutes. This implies that the scaledowns take place gradually, smoothing
the effects of fluctuating the metric values.
- API Object
- Horizontal Pod Autoscaling in Kubernetes is an API-based resource that is part of the
Kubernetes autoscaling API group. The latest stable version is found within the API
autoscaling/v2 version that provides support for scaling memory as well as custom-made
metrics. The new fields that were introduced in autoscaling/v2 will be preserved as
annotations in autoscaling/v1.
- When you create a HorizontalPodAutoscaler API object, make sure the name specified is a
valid DNS subdomain name. More details about the API object can be found at
- Stability of the workload scale
- When managing the scale of a group of replicas using the HorizontalPodAutoscaler, the
number of replicas may keep fluctuating frequently due to the dynamic nature of the
metrics evaluated. This can be described as flapping, thrashing or thrashing. It's
similar in concept to the notion of hysteresis in cybernetics.
- Autoscaling during a rolling update
- Kubernetes allows you to perform the rolling update of a deployment. In this scenario,
the Deployment will manage the ReplicaSets that are underlying for you. When you
configure autoscaling for a Deployment, you bind a HorizontalPodAutoscaler to a single
Deployment. The HorizontalPodAutoscaler manages the replicas field of the Deployment. It
is accountable for creating replicas for the ReplicaSets that are the underlying
ReplicaSets to make up a suitable amount during the rollout and then later on.
- If you make a continuous update on a StatefulSet which has an auto-scaling number of
replicas it is because the StatefulSet is directly managing its Pods (there is no
intermediary resource that is similar to ReplicaSet).
- Support for metrics for resource use Support for resource metrics.
- Any HPA target can be scaled according to the resources used by the pods that are part of
the target for scaling. When you define the pod specification, the requests for
resources like CPU and memory must be defined. This is used to calculate the utilization
of resources and is it is used to allow the HPA controller to determine the resource
HPA controller to adjust the goal up or down. If you want to use resource
utilization-based scaling, provide an appropriate metric source such as: With this
metric, the HPA controller will maintain the average utilization of pods that are part
of the scaling target at 60 per cent. Utilization refers to the proportion of the
present usage of resources and the resources requested within the pod. Check out the
Algorithm for more details on how utilization is calculated and averaged.
If you change the name of a container that a HorizontalPodAutoscaler is tracking, you can
make that change in a specific order to ensure scaling remains available and effective
whilst the change is being applied. When you update the resource that is the basis for
your container (such as deployment) it is recommended to upgrade the HPA to keep track
of both the old and new containers' names.
This is how the HPA can determine a scale recommendation during the process of updating.
Once you've rolled out the new name for the container to the resource for work, clean it
up by removing the old name of the container in the HPA specification. HPA
We hope that this article has helped you understand the way Kubernetes horizontal Pod
autoscaling operates in addition to how it is set up. HPA lets you expand your
applications according to various metrics. By dynamically adjusting the number of pods
using HPA, you can use HPA in a highly efficient and cost-effective way.
If you need assistance with the operation of Horizontal Pod Autoscaling or want
to know more about the concept. Contact an established and trusted
software development firm. Experts and developers will help you navigate the process and
provide you with a greater understanding of the idea.