The Benefits Of Self-Hosting Kubernetes

April 5, 2023

Working in platform engineering, when I decided to start self-hosting, choosing to use Kubernetes was (almost) a no-brainer. There’s no other orchestration platform that comes close to it in popularity; it’s the industry standard.

However, like most things that are extremely popular, Kubernetes does have its detractors. And some of the more common complaints are that (i) its design is just inherently over-complex and (ii) it’s solving problems that most people just don’t have (this is normally in reference to scaling). Both things that might sound concerning to a self-hoster.

I’m perhaps not the best person to give a true accounting of its inherent complexity - I’ve worked with Kubernetes on-and-off since 2015, and have been working as a platform engineer for much of that time. For that reason I didn’t even try to do things in the simplest way possible - I did it from scratch so there’s more room for tinkering, trying new networking solutions, etc. A lot of what I learn is going to be directly related to $DAYJOB so I could justify some investment in it.

But it’s the second point I want to talk about here. Certainly, Kubernetes traces its lineage back to an operating model that was very successful at very high scale. But that doesn’t mean that it doesn’t offer benefits for other users too.

I want to share here how my observations about what Kubernetes was doing for me even in my very smallest of small installations, and how that translates into practical benefits.

Running Software on a Group of Computers

Initially, one of the big selling points of Kubernetes was that it is able to manage many nodes and do bin packing of workloads so that the resources of all nodes may be utilized efficiently, whilst maintaining workload reliability. This is why they call Kubernetes an OS for the data centre, and these features are still a major selling point of Kubernetes for large clusters of servers.

However, I don’t have a large cluster. I have three Raspberry Pis. Good bin-packing is still a benefit, but it is feasible for me to manage this manually.

There is another benefit of running software on a group of computers that’s more relevant to me. It allows you to get started with almost nothing, and add additional hardware as required. I didn’t have to go out and buy an expensive server, I just started by buying two Raspberry Pis. Then, I started running workloads on them and a few months later they were running at capacity, so then I bought another.

Moving workloads between nodes is (generally) trivial, so a new node can start running workloads with very little effort. It’s also easy to remove a node from the cluster, to rebuild or just generally tinker with it, with minimal impact on the running apps. This is great for uptime, but it’s also less work for me because I don’t have to worry about service reallocation or reinitialization when taking nodes in and out of service.

These capabilities have particularly helped me - because one of my motivations to start self-hosting was to get better at monitoring. Overprovisioning can be a pragmatic way of achieving reliability, but I wanted to start with a minimal amount of hardware, and add workloads until I could see things going wrong. In this way, I’ve been able to learn how to identify problems via monitoring, and to come up with fixes - drawing on the ability to incrementally add new hardware to the cluster and move workloads around as required.

(Predominantly) Well-Behaved Workloads

As Kubernetes is built on containerization, we can also rely on some nice properties of the containerized workload processes. Perhaps most important is that workloads are generally fairly predictable - I rarely run into issues with software failing due to unexpected errors resulting from the configuration of my system. The only recurrent issues have come from the fact that Raspberry Pis run on the ARM64 architecture, which does mean that sometimes container images with the correct architecture are not available. The experience of running workloads on Raspberry Pi is not quite as smooth as in other cases. Even then, it is usually fairly simple to rebuild the image with the correct architecture. Despite all the evangelism that has been done in favour of container-based platforms, I think that the improvements in APIs for software packaging & deployment are still an under-rated benefit of these platforms as compared to previous technologies.

Another nice property of containerization is that we can rely on processes to share resources nicely when scheduled on a node with multiple other processes. These capabilities are widely adopted too - most (but not all) third-party software packages come with reasonable defaults set, so they really do just get the resources they need. In the cases where they don’t, Kubernetes generally does an OK job of applying sensible fallback defaults.

Package Installation & Management

On a single machine, I can easily install packages via a package manager. Mostly we use these package managers to install tooling that you invoke as a CLI, but we use them to install services too.

“Package Management” exists on Kubernetes too. However, the type of packages that you install on Kubernetes are somewhat different. We don’t use Kubernetes as we do a workstation, so the packages typically are services of various types, requiring some degree of customisation. In fact, packages often consist of distributed and heavily customisable services.

We don’t have an Ansible playbook applying all the desired configuration settings (including packages to install) to various files spread across the server’s filesystem. Instead, Kubernetes packages are composed of a set of YAML resources representing the desired state of the various components and configuration that makes up the package. Each resource is of a specific type and therefore has an associated schema of permitted fields. To install the package, we just need to send that set of YAML files to the Kubernetes API server, which will converge the desired state to the actual state.

The strict structure of Kubernetes YAML that you use for packages lends itself well to programmatic modification or templating, and there are quite a few different tools that have been developed to manage this process.

Tools like kustomize, jsonnet, ytt and helm all can help in different scenarios. Although it’s not my favourite tool in general, it’s particularly easy to get started deploying things with helm. In any case, the upshot of this declarative YAML deployment method is that it is easy to update packages, even when they have been customized. Automated processes can fetch updated packages and apply a consistent set of customizations, before sending the resulting Kubernetes YAMLs to the Kubernetes API, which will manage rolling out to the new desired state.

The details of how to do this deserve a post of their own. But the upshot is that, with very little effort, I am able to keep packages installed on my cluster patched and up-to-date.

Without the structure and affordances that Kubernetes offers, it would be very difficult to capture all the configuration involved with a package, and the process of updating multiple packages would be more complicated and error-prone.

Unified Control of Apps & Infra

The YAML resource-based approach is not just limited to applications. In Kubernetes, there’s an API for every type of configuration you have. This includes third-party addons, in the form of custom resources. The controller pattern in Kubernetes has been tremendously successful in enabling these types of extensions. It’s truly very handy to have a single API to be able to manage and query (almost) everything, resources such as:

Storage volumes
Running processes (containers)
Process configuration & secrets
Cronjobs
CI/CD Pipelines & Data Pipelines
Load balancers / virtual servers
Certificates
Firewall rules
Roles for user access

Using projects like CrossPlane, some people like to take this even further, and control things in arbitrary external systems via the Kubernetes control plane. This can be taken too far, but there is certainly a great benefit in having “one API to rule them all” for all apps and cluster-adjacent infrastructure.

Ecosystem

There are a number of tools that you can easily install on Kubernetes, that hook into the information offered by Kubernetes and are able to offer out-of-the-box functionality for many things, including:

Monitoring and Alerting (e.g. with kube-prometheus)
Logging (e.g. with fluent-bit)
Threat Detection (e.g. with falco)

Even for general-purpose, non-platform-related tools, the likelihood is that for whatever app you want to run on Kubernetes you will be able to find a pre-existing package you can install. In my case, this has included apps like:

Postgres
Jupyter Notebook Server
Pihole (DNS server)
TiddlyWiki

A lot of the projects in the ecosystem are pretty cutting-edge in their domain, and generally, integrations can be expected to be comparatively well-supported, good quality and feature-rich. With my goal of getting better at monitoring, I’ve particularly benefitted a lot from the tooling that’s available in the monitoring domain.

It’s a large and active community supporting the ecosystem, with all the benefits but there are technical aspects to this as well - tools benefit from close integration with the underlying platform, and an easy install via Kubernetes package installation.

Conclusion

Kubernetes is a lot to start off with when you just want to run an app, even if it’s on self-hosted infrastructure. The benefits of a container orchestrator are various, but you can get many of the benefits of containerization with an orchestrator that’s easier to set up, like Docker Swarm or (perhaps?) Nomad.

However, you cannot benefit from the customization of the Kubernetes platform, from its opinionated but extensible APIs. Nor can you use the many packages that have sprung up in the Kubernetes ecosystem. You might start off faster with a simpler orchestrator (or with no orchestrator at all), but you may miss out on some of the things that make Kubernetes a smooth experience as a fully-featured modern platform for apps.