Kubernetes security hates us
Closing out three years building a PaaS on Kubernetes. —
Inspired by the post Text Rendering Hates You, I thought I’d use my recent departure1 as the tech lead of a project building a PaaS on Kubernetes to dive into the icky bits of Kubernetes security and make some propositions about how to fix them.
The views expressed here are my own and don't reflect those of my employer.
It may surprise you that Kubernetes – the thing modeled after Google’s Borg, killer of Docker Swarm, and hyped as the “cloud operating system” is actually fiendishly difficult to secure. It really boils down to three core properties the way Kubernetes was designed:
- Everything is declarative
- Secrets are a builtin type
- Namespaces are the unit of isolation
Unfortunately, these are all core tenets of the value that Kubernetes provides so we have to live with them for better or worse. Let’s dive into each and see why they’re problematic.
Everything is declarative
As a developer, this part is actually really delightful. You can describe the infrastructure you want and controllers would constantly reconcile drift. A lot of the time, if an app was misbehaving I could go grab a coffee and by the time I was back a controller had coaxed the misbehaving app back to working order.
Because the system is declarative rather than imperative, it’s eventually consistent and doesn’t represent an exact state:
Transactions across objects are not required: the API represents a desired state, not an exact state.
However, tempoaral ordering and consistency is really important to the correctness of security authorization. So important there are dedicated systems to validate it.
In the declarative world, if you remove a permission from a role it’ll eventually propagate to aggregate roles rather than doing so atomically. If you refer to objects or resources that don’t exist in a RoleBinding it’ll be silently accepted rather than rejected and will be a no-op until something comes along that matches (or a malicious actor creates a matching resource).
Secrets are a built in type
This is another thing that is great at first glance. Managing secrets is a pain in the rear so having a built-in type that’s probably but not guaranteed to be encrypted at rest each platform encrypted at rest is great.
Because they’re built in, everyone uses them…leading to a situation where every user and controller needs to be granted access to secrets. But it gets worse, kubelets (the things that run pods which contain arbitrary code) use their privileges to read secrets and inject them into running containers. This means anyone who can declare an environment variable on a Pod, Deployment, StatefulSet, etc. can read any secret in the namespace (emphasis mine):
A user who can create a Pod that uses a secret can also see the value of that secret. Even if the API server policy does not allow that user to read the Secret, the user could run a Pod which exposes the secret.
This might not be terrible, but secrets are necessary to configure private keys for TLS, establish authentication between the K8s API and policy enforcement webhooks, store service account bearer tokens. And those are just some of the built in features! Most interesting operators that can be installed like Istio, Knative Serving, or Tekton also get access to read secrets, can create Pods, or are meant to enable users to (eventually) run arbitrary code because Kubernetes is a workload scheduler.
So, we have a type called Secret that’s not even guaranteed to be encrypted, can be read by anything or anyone that can schedule a Pod (or a thing that becomes a Pod), and is a concept shared by both the trusted infrastructure and untrusted workloads running on it.
Namespaces are used for isolation
Namespaces are another handy tool that Kubernetes provides out of the box to enable isolation. The docs describe them this way:
Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces.
From a security perspective, virtual clusters are a fantastic idea. Like containers, you could safely delegate privileges, virtualize resources, and set limits. If these virtual clusters worked like a real cluster, they could then be subdivided with their own namespaces creating something akin to the really powerful capability-based security.
Instead, Kubernetes has a strict two level system. Each type of resource has to be either cluster or namespace scoped. Namespaces can’t contain cluster scoped resources (and vice-versa) so it’s common for types from one level to refer to types in the other e.g. RoleBindings referencing ClusterRoles, Volumes referencing Secrets, PersistentVolumeClaim referencing Volumes, etc. This shatters the namespace boundary as a unit of isolation. Instead of trusting Kubernetes to keep a lid on things, you have to trust Kubernetes as well as every controller that has permission to bridge cluster and namespace scopes. And because of the issue we saw earlier with secrets and pods, we also have to trust anything that can create a Pod in the controller Namespace or can read a Secret.
It’s much more difficult to validate a web of interconnected things is secure than a tree where each nested scope can’t have more permissions than its parent.
Takeaway
The moral of the story is that everything on a cluster should be trusted to the same degree. If you lock down your network around the cluster, ensure only trusted images are run, use an external authentication plugin, and put high security workloads on their own cluster things will probably be fine.
If you are running controllers/operators from third parties you need to make sure they know their stuff because it’s really easy to mess up.
Proposals
Long term, it would be great to see virtual clusters take off to enable hard multitenancy.
In the short term, it would be great to see RBAC support assigning permission by secret type so controllers could limit which types of secrets they could consume.
A Kubernetes equivalent of hard links for objects you want to distribute out but need a single source of truth for would solve some of the issues of duplication and could open up interesting projects to make inheritance easier removing much of the need to break the namespace barrier.
Finally, adding a first-class way to mask the fields users could read and write from objects (similar to database views) would reduce the need for new controllers, and abstraction layers without breaking existing workflows. Existing solutions like the Open Policy Agent could help here, but aren’t as friendly as a built-in filter because of the extreme level of control they need over your cluster to operate.
I’m now working at Google Domains. ↩︎