The Distributed Resource Scheduler (DRS) is an essential component of any virtualized environment, except in rare cases with small and unloaded infrastructure.
The main goal of DRS is to balance the load on hosts located inside a computing cluster so that virtual machines (VMs) and applications deployed on them always receive resources in the required volume and work with maximum efficiency. At the same time, the number of physical servers involved remains minimal.
What is DRS, And What Is Cloud Balancing For?
VMware first coined the term DRS after the VMware DRS utility of the same name, designed to balance a cluster of virtual machines. A balanced bunch is one in which hosts are distributed evenly between virtual machines in terms of resource consumption, and there is no situation when, for example, one host is used by 99% and the other by 30%.
When a new virtual machine gets connected to the cluster, it is usually automatically allocated the most optimal host based on its resource requirements and the group’s state as a whole. But from time to time, the workloads of virtual machines change dramatically, which can lead to imbalances in resource allocation and degraded overall performance. Some hosts may run out of resources while others will be idle.
The task of DRS is to detect such an imbalance in time and, depending on the chosen level of automation, either make recommendations for transferring the VMs that caused the imbalance to less loaded hosts or perform this migration automatically.
To find the most optimal host for each VM, a special automatically started algorithm is usually used that considers the current resource consumption in the cluster (memory, processor time, and so on), as well as the resource requirements of the VM itself.
What Is The Use Of DRS? The Fact Is That In The Absence Of Automated Control, The Distribution Of VMs Between Servers Can Be Highly Ineffective. Here Are Just a Few Of The Potential Risks:
Bin Packing Problem
There is a risk of “uneven” filling of servers with virtual machines and an unjustified increase in the used capacity.
It is not possible to predict the load profile within a virtual machine in advance. Therefore, when determining the optimal hypervisor for creating a virtual machine, specific parameters are used: the number of cores, the type of disk subsystem, the individual characteristics of the flavor of the virtual machine.
During the subsequent operation of the virtual machine, it may turn out that its specific load profile is more (or, conversely, less) demanding on the utilization of the processor and memory. In the absence of DRS, it is tough to detect in time, which leads to sub-optimal resource use.
In practice, the problem described will be further complicated because not always used physical servers have the same capacity. It is recommended to use homogeneous equipment to build clusters: this greatly facilitates resource allocation. But in real life, the configurations of servers purchased at different times can vary greatly – in the number of cores and processor power, the amount of memory, disks, and so on.
In our example, the capacity of all servers will not be 1.4, but 1.7, 1.9, 1.5, and so on. In addition, the complete utilization of resources on the server should not be allowed: there should always be some “reserve.” For example, in our model, servers with a capacity of 1 resource would not be suitable. All this further complicates manual planning.
Late Transfer To New Servers In Case Of Exhaustion Of Hypervisor Resources
If the first type of risk does not significantly impact cloud customers, but only leads to ineffective use by the provider of its capacities, then the second, on the contrary, can dramatically slow down the work of user applications.
On the hypervisor, the provisioning of resources to virtual machines occurs, as a rule, with an oversubscription. Several cores of different virtual machines are superimposed on one processor core. Therefore, if one of the machines runs out of allocated resources, it will affect the operation of all VMs located on the same core.
The use of the DRS mechanism eliminates the described risks and brings the provider (and, therefore, its customers) the following benefits:
- maintaining maximum responsiveness of client applications,
- even distribution of the load taking into account the current metrics from the VM,
- increasing resource utilization and minimizing equipment downtime,
- Protection against a standard client error when an excessive number of resources are requested under the VM, which are not required in practice.