How to write a great protocol or file format

Lesser programmers borrow; great programmers steal. — 12 Feb 2021

If you’re starting a project to build a new piece of software it’s critical to create a solid communications protocol between pieces of your software, this could mean clients and servers, instances of your software, plugins, or different versions over time.

When jumping into a project like this, you’ll probably be doing it because nothing else on the market fits your needs. Starting from that perspective it can be very tempting to invent everything from the ground up to create a perfectly bespoke system.

Here’s a quick checklist for what your users will likely want as it matures:

Draft state

When you’re still working on the implementation of your server and client think about:

How you’re going to represent different statues and classes of errors. These should include user errors, client errors, server errors, and temporary errors.
How server and client will negotiate versions of your protocol.
How you’re going to represent and encode the data.
How you’re going to offer a reference implementation.

MVP

As a bare minimum first pass, think about:

Encryption, this is a must in any modern format. You care about the privacy of your users, right?
Authentication, protocols should include a way to authenticate users. Ideally, it will support multiple ways for good integration with external authentication systems (tokens, username/password combos, public/private keys).
Reverse proxy support, while this isn’t a necessity to build yet, your protocol should be able to support proxying traffic based on DNS hostnames. Supporting this means you’ll be able to run multiple instances behind a single IP:Port combination. This opens up opportunities to:
- Run the service in a multitenant way.
- Use wildcard DNS addresses to dynamically spin up new instances.
- Make administration easier (single IP/port open for firewall rules) and cheaper too.

Post MVP

After you’ve achieved the bare minimum, it’s likely you’ll want some or all of the following:

Format negotiation - to help clients/servers of different types/versions/needs.
Compression - allowing this to be tuned lets your administrators make decisions based on compute and network costs.
Client library generation - your format should be usable by a wide variety of clients. In order to keep them in-sync, they should be generated.
Testing tooling - your format should have debugging tooling and test suites to validate client and server compatibility.
Multiplexing - your protocol may want to support multiplexing connections to avoid the overhead of establishing multiple connections.

Conclusion

You could write it all from scratch, but maybe just use HTTP instead so you can get on with the meat of your project.