Non-obvious uses for container registries

Static images are just like, your opinion, man. — 5 Sep 2022

I read a post the other day about non-obvious Docker uses and got to thinking about the ways I’ve seen the Docker/OCI spec and registries used as well as some of the ways I’ve thought about using them to solve problems.

So, what can we do with a registry – especially one we can program?

Store data. Registries generally treat the data of images as opaque and there’s no requirement that images are executable so they can be used to store chunks of immutable data. Helm is doing this under the Open Containers Artifacts specification as a way to deploy manifests. I’ve also used it for source code in Kf.

Dynamic patching base images. If you distribute lots of applications on top of a common base image you can build a registry that dynamically swaps out the base image at pull time. The short lived Cloud Foundry project Eirini did this which allowed them to “hot patch” applications. At one time the Bulidpacks project was also looking at doing this.

Adding application configuration.. You could make a pass-through registry that layers on application configuration to an image while it’s being pulled. Configuration could include environment variables, root certificates, platform information, and polyfills for the environment the container runs in such as APM tooling. This would allow a slightly more platform agnostic (e.g. Cloud Foundry, K8s, Docker Swarm, Mesos) approach to running images than configuring everything in a platform specific way.

Start worker processes. Similar to dynamic application configuration you could run a leader process that hosts an image registry which schedules follower processes that point to the registry it’s hosting. It could also layer on the chunks of work to make sure the worker nodes could run with minimal permissions.

Establish identity. If you layer in a unique client certificate to each pulled image that includes what it is, the machine that’s allowed to run it, the identity pulling it, the environment it’s running in, and an expiration date – then you could use it as a solid foundation of a “process” level identity and access control.

Merge images on the fly for purpose built containers. Similar to adding new layers mentioned above you could merge entire images on the fly. This would be useful if there were an N x M x … grid of options you wanted to create. For example, distributing different versions of a trained ML model next to different versions of the run-time on top of a minimal or full base image.

Dynamically tag or name images. If you publish images that others consume, it could be useful to have registries that could roll out tags based on client populations similar to A/B testing. This could prevent instant global rollout of a bad tag – but could be a hassle to debug.

Dynamically build images. You could transform data in one space (e.g. a Git repository) to another (OCI image URI) using a tool like Buildpacks to convert source code into images on the fly.

All together now

If we layer these pieces together we get an interesting approach to something that kind of looks like a PaaS, except it isn’t tied to any particular platform.

Here’s a rough example of what this would look like that won’t pass muster tests for a production system but should give a general idea of the capabilities.

Start with converting images to code. This URI builds a Git repository and caches the result for 90 days, both to reduce load and to ensure stale images aren’t deployed for security reasons:

example.com/build/rebuildAfter=90d/<REPO>/<SHA|TAG>

Next in our chain is the OS patch level which switches out the base images supporting three tags stable, current, and next. This allows application developers to run smoke tests against the apps prior to newer OS patch rollouts and to have some grace period to fix obsolete images:

example.com/patch/<REPO>/<SHA|TAG>:<OS_PATCH>

Up to this point the images have been theoretical – only defined at pull time – but now we’ll materialize them to produce some monotonically increasing stream of rollouts. Note that the tag is relative, a tag of 1 means one before latest.

example.com/rollouts/<BUILDNAME>:n-<RELEASE_DELTA>

To distribute these out to different environments, we create a registry per data center, this sets environment variables and adds an environment based identity so the app can communicate with other services in its environment. Release deltas are set by administrators to allow gradual rollouts:

<LOCATION>.<ENVIRONMENT>.example.com/app/<BUILDNAME>

From here it’s up to some other system to actually get these onto the platform. The platform itself could be told to pull updates to workloads once a day or some other platform level configuration system could do it.

Finally, for easy debugging, we’ll allow developers to pull an image with debugging tools, a root user account, and sshd with the developer’s public key layered in. The app’s identity is pinned to the development environment to prevent abuse:

ad-hoc.example.com/<REPO>/<SHA|TAG>:<OS_PATCH>

Closing thoughts

Are all of these ideas good? Probably not. But there is something neat about applying the principles of functional programming to a storage tool that’s ubiquitous, cheap, and mostly standardized.

And, because the system is composable, you can always swap out what you like.