Idea: WASM for Library Sandboxing

Toward a better path than 'do nothing' when you can't resolve a CVE. — 26 Sep 2024

This is an idea that’s been bumbling around my head for the past couple of years. It’s something I would like to spend some time building, but don't have the appetite to do the whole thing so I’m going to throw it out into the world to see if it resonates with anyone else.

Hi, I’m Joseph. I’m an application developer, and I’m tired of dealing with CVEs.

Modern applications, like the ones I write, depend on hundreds or thousands of different software libraries. This is inevitable. I don’t have time to roll the clock back to 1970 and rewrite everything I depend on from scratch – I need to deliver something this quarter.

The libraries I use are made by other people. These people need to deliver their own things so they also depend on more libraries. And so on.

It’s a little known fact that programmers sometimes make mistakes, and some of those mistakes are security problems.

When someone discovers a security issue, they should work with the developer who wrote the library to patch it and file a CVE which explains the issue. When the issue is fixed all the people who use the library will carefully read the CVE to understand how it impacts their library and propagate the fix. The fix will bubble all the way up the chain of dependencies to my application and all will be right in the world.

At least in theory.

The problem with patching

Once a CVE is published, there’s a race between people who will try to exploit it and people who need to fix their code to prevent that from happening. Historically, people have been so bad at patching that now savvy customers are demanding timelines e.g. an application must be patched no more than 60 days after a CVE Is published.

This might work if there were a few thousand CVEs per year like in 2003, but in 2023 there were over 22,000. It can be overwhelming to appropriately review every CVE, make a patch, and properly test it.

Properly handling a CVE is tricky partly due to an inherent misalignment of incentives between library owners, users, and reporters.

CVE reporters are often looking for prestige or pay so they have an incentive to come up with worst case scenarios and don’t always understand the libraries they’re probing deeply.
Library owners have an incentive to dismiss CVEs because many are invalid or straight up bogus based on their understanding of the library. They are also pressured by some users to update dependencies to keep compliance and pressured by others to never make breaking changes. There’s usually very little incentive for them to backport fixes to older versions so they advise users to upgrade to new versions of the library which include unrelated changes.
Application developers are incentivized to fix CVEs that the company contractually committed to, yet are disincentivized from upgrading libraries because it causes risk which might cause outages and cost money.

The easiest way to deal with these CVEs is to stay on top of patching, but that’s expensive especially for applications that are otherwise working well. When an application has hundreds or thousands of dependencies it could easily have a dozen CVEs a year and a few of those will inevitably come with breaking changes that need to be dealt with.

This happened in an application I was working with recently. An important library became unmaintained and was stuck using a vulnerable version of another library. Removing the unmaintained library would have meant a very risky refactor for my team that we didn’t have budget for so the issue just sat.

If you can’t spend time chasing down every CVE and know that you’ll eventually be asked to upgrade then the safest approach is to always patch and accept the cost and stability risks. But, there might be a better way.

Capability-based security

In an ideal world, we’d be able to give imported libraries access to the capabilities we need from them and nothing more so even if they’re vulnerable they wouldn’t be able to do much.

Here are some example capabilities you might give to different libraries:

	Filesystem	Networking	Environment Vars
JSON Decoder	Deny	Deny	Deny
SQLite Client	Read/write *.sqlite3 files	Deny	Deny
Postgres Client	Deny	Open TCP sockets	Read `PGPASSWORD`

The good part about a capability based approach is that it allows you to more safely use untrusted code. In the example above, it doesn’t matter if either database driver had a vulnerability that allowed remote hosts to execute a program if you denied it that permission.

If you wanted to achieve capability based security today, you might try sandboxing at the language, process, or service level.

Traditionally, you might spin up multiple isolated processes or services where each is given the capabilities it needs. Think multiple containers in a Kubernetes pod, ptrace sandboxing, gVisor, or V8 Isolates. This approach works if:

You know where to draw security boundaries,
You have the time and capacity to understand your underlying sandboxing system, and
You can correctly orchestrate these separate processes.

None of those are easy if they’re not a core part of your job.

On top of the work to actually secure your software, you still need to do the reporting work to prove things are sandboxed appropriately to the systems keeping track so the security overlords are happy with their dashboards.

Proposal

We should let developers sandbox portions of their application in a way that feels native to their ecosystem, abstracting away the reporting mechanism to call code within the sandbox.

The runtime would allow profile production deployments to help identify over-granted permissions and ratchet up compliance over time.

Practically, Webassembly (WASM) seems like it will be a good approach for this – once the GC and threading proposals have the kinks worked out. WASM brings many benefits:

It should have comparable performance to Python/Java/Go/C# which are common for application development,
It has APIs designed around capability based security, and
It’s portable enough that it can target the same architectures as traditional business applications.

Example

In this scenario we need to communicate with a vendor’s website using their client which pulls in a bunch of stuff we don’t want to trust. We’ll use Java conventions, but

We’d first wrap the functionality we need into an interface Client.class that allows us to interact with our expected domain objects, mock the client in testing, and use dependency injection. In this case we separate out the build unit so it produces a shared library Sandboxed.jar.

If we imported and used this Sandboxed.jar library directly, it would behave normally.

Instead, we then wrap the sandboxed library into another library using our sandboxing tool. This tool creates another library (Wrapper.jar) that has a dependency on a shared runtime and exposes a generated shim Client.class that behaves the same as the original, but actually instantiates and runs the sandboxed code using the shared WASM runtime (WASMRuntime.jar).

When we generate the wrapper library, we can also pass in the maximum capabilities we want to provide at runtime to sandbox the code. These could be best guesses from the programmer, and/or automatically determined by what the program actually used in the past.

In order to make security scanning happy, this wrapper should expose the sandboxed SBOM as its own with some annotation indicating that the included libraries are sandboxed versions.

The WASM runtime will include the shared functionality needed to run the sandboxed libraries as well as the profiling code that can keep track of over or under-granted capabilities to help developers right-size the code long-term.

Prior Art

ncruces/go-sqlite3 is an SQLite3 implementation for Go that avoids C bindings by embedding a WASM SQLite3 build and an interpreter. It integrates with Go’s database driver package and can be used normally which shows the idea above has promise.
Pglite.dev is Postgres compiled to WASM so it can be embedded like SQLite. It shows even applications that weren’t designed to be embedded can be bundled cleanly in WASM.
google/wuffs is an alternative approach to handling safety. It requires developers to write in a restrictive language that can have certain security assurances proven and does source to source transpilation to generate the libraries. This requires a lot of work but is nice because it produces native speed software.
Service Weaver is a framework for writing monolithic applications that allow portions to be split off into independent services. It feels similar to a Google internal service composition framework called Boq, but has a friendlier UX. If this approach was paired with a capability system it would be very close to what we want.

Closing Thoughts

This approach won’t work for sandboxing every library. In fact, Log4J is an example of one that probably wouldn’t work because so much code directly depended on it.

That being said, adding another tool to our kits that would give us a new well-lit avenue to mitigate risk and cut down on the CVE noise would be a very welcome addition.