Cluster Failover Is A Separate Issue
Kubernetes makes it easier to provide application failover, but you also need to implement cluster failover.
Kubernetes has a master node that directly manages the cluster and contains its configuration, metadata, and statuses of Kubernetes objects. A failover cluster includes three controller nodes, separate from the group itself and duplicated. Each node is a different server or virtual machine; business applications cannot use them. They need to be separately connected and serviced or paid rent in the cloud.
This creates a challenge for small businesses: previously, all applications required only two servers, and with Kubernetes, three additional servers are needed just for fault tolerance.
Also, in the Kubernetes cluster, there is a great feature – a built-in self-healing mechanism. If one of the nodes fails, then all processes previously running on this node are automatically restarted on other cluster nodes. But for this to happen, the remaining nodes need a resource reserve. And it cannot be occupied with anything. Otherwise, the applications will not be able to move in case of problems.
The reserve depends on how many failed nodes are likely in your case:
- If you have one rack with servers in one data centre, then at the same time, most likely, at most one node on one server will fail, for example, due to OS errors. So, it would help if you had a reserve for one node. Of course, the rack may break, but redundancy is already needed here, not using Kubernetes.
- If you have several racks with servers, then there is a possibility of losing one frame, for example, due to problems with the switch, when all the servers in it become unavailable. This means that you need a reserve in the number of servers in one rack.
- If you have several data centres, you need to keep a reserve the size of another data centre so that applications work in case of failure.
In simple terms, it looks like this: when there are ten nodes in a cluster, and you want to survive the loss of one node without problems, you need a 10 per cent reservation of resources. If applications should work even with the loss of 50% of the cluster, all nodes need a margin of 50%.
For this reason, Self-Hosted Kubernetes, in most cases, can only be successfully launched by large companies where it is possible to allocate employees to maintain the cluster, and there is no need to save resources.
In addition, the self-deployment of a cluster is not a quick matter. If you need to launch a cluster in a short time for a project or test environment, then this will not work on Self-Hosted: deployment will take several hours or even weeks. This is worth being prepared for. For comparison: in the cloud, you can launch a KaaS cluster in 10 minutes, and you can immediately use it, but this is because the provider’s specialists have already worked on the infrastructure part.