Kubernetes operators, platform building, and IaC

License changes at HashiCorp and the sudden sale to IBM last year culminated in the OpenTofu fork that has just reached version 1.10.

But disquiet about the license status or cost of Terraform isn’t the only reason some platform engineering teams have been looking for an alternative to traditional Infrastructure as Code (IaC) approaches.

Powerful as Terraform is, it was never designed to run as a service, and if you want to use Kubernetes as a universal control plane, suggest some, why not use that for everything on your platform down to the infrastructure, including local and cloud resources as well as applications?

Hand in hand

The traditional approach is to combine IaC for handling common infrastructure resources (networks, traffic ingress, centralised logging pipelines, security tools and so on) with operators.

The latter encode expertise and best practices for installing and maintaining complex applications, and can be used for handling backup, disaster recovery and failover, automatic restarts, zero-downtime rolling upgrades, resizing, scaling based on load (even for resources that don’t have that built in) and so on.

Operators put that knowledge right into the Kubernetes control plane, where you can take advantage of drift detection and reconciliation as well as RBAC, especially if you’re using Argo or Flux. “You can speak to your database in the Kubernetes native language,” explains Michal Nosek, an enterprise architect at Percona (which uses operators to install multiple databases in its Everest data platform).

“Traditional IaC tools like Terraform and Pulumi are excellent for provisioning infrastructure, but they typically stop at the initial deployment phase,” Mirantis VP Randy Bias points out. “Operators go further; they continuously observe the actual state of resources and reconcile them with the desired state.”

In theory, that turns installing even a complex application like a database to a single configuration step, Nosek says. “It doesn't matter how complex a database cluster you want, how many replicas, how much high availability you need, and what types of complex backups you need.” The operator handles the complexity. “You don't need to figure out all those details on your own: how to manage database passwords and secrets and integrated monitoring solutions, how to deploy replicas in what order, what versions and so forth.”

Operators also fill a gap in Terraform’s declarative paradigm. “I think that’s sometimes why people hate Terraform: because they try to describe how to make things happen and that really becomes problematic.”

Beyond databases

Helm charts remain the best way to install simpler applications, but operators are popular for stateful workloads like databases which haven’t always fit the Kubernetes paradigm (although newer features like volume expansion through the Kubernetes storage interface simplify things).

Operators are ideal for complex application lifecycle management, Bias points out: “especially for services that involve state, dependencies, or multi-step orchestration”.

The backing store for cluster date in Kubernetes, etcd is infrastructure that looks a lot like a complex application.

“If you add a replica to etcd, there's an protocol that needs to be executed to add it to the quorum; you need to add it as a read-only replica first to sync the state, and then, once it is synced you can add it to the quorum but you have to have the right number in the quorum so you need the right number of replicas,” explains Brian Grant, ConfigHub CTO and the original Kubernetes API design lead. “There’s a complex protocol for ensuring the correct behaviour of the etcd cluster and just installing it with a Helm chart is not going to do that for you.”

This post is for paying subscribers only

Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events.

Subscribe now

Already a member? Sign in