Everybody knows documentation is essential to any software engineering enterprise. And fo’ shizzle, everybody knows it gets deprioritised. An afterthought; written by engineers who are thinking, “I should probably write docs”. Nah. What they should be thinking is, “I get to write docs, cuz!” Because when you doc it right, it ain’t a chore. It’s what separates a project people use from a project people lose.
In this post, I’m walkin’ you through three real projects I’ve been involved in at Tweag; each one levelling up the documentation game. First, fixing docs that got out of hand: the reactive play. Then, planning docs from day one: the proactive play. And finally, making docs part of the code itself: the integrated play. By the end, I think you’ll agree: to doc it like it’s hot is the only way to gizzo.
When your README’s a monolith
Sometimes the docs are already a mess and you gotta clean house. That’s the reactive play.
That was the case with Topiary, Tweag’s universal formatting engine. It uses Tree-sitter grammars and queries to format code; encoded in what Tree-sitter calls “capture names”. All of our formatting capture names needed to be documented, with their semantics described.
Moreover, our documentation covered usage instructions, which were
checked against the --help output of each subcommand. Then there was
our project motivation and design philosophy, language support,
installation instructions, configuration details, usage guides…
…All in a single README.md which had grown to over 7,000 words. Way
too big for the crib, homes. Ain’t nobody reading all that!
Drift and inconsistency were creeping in, making it harder for the team to maintain. Worse, it was straight-up hostile to users: how you gonna expect someone to sift through all that noise just to find what they need?
Topiary OG Erin started work reconstructing the monolith into a book format using — as it’s a Rust project — mdBook. I picked up where he left off and finished it for Topiary v0.6.1. Yo, the Topiary Book was born!
But the move here wasn’t just splitting up the README.md and calling
it a day. You need a crew and every member needs a role. That means a
framework; something to keep it tight, maintainable and user-friendly.
We rolled with Diátaxis, which identifies four distinct documentation
types based on what the reader actually needs:
- Tutorials, for learners. For example, Yann’s step-by-step guides walk readers through creating a formatter from scratch for a toy language, starting from zero. Its aim is to actively reach understanding through engagement, rather than just passive reading.
- How-to guides, for readers who want to accomplish a specific goal. “Adding a new language” assumes you already know Topiary and gets straight to the point: register the grammar, create a query file, update the test suite, rinse and repeat.
- Explanation, for those who need a deeper understanding. For example, “Tree-sitter and its queries” explains the conceptual foundation — what Tree-sitter is, why Topiary uses it and how queries relate to formatting — without asking the reader to do anything.
- Reference, which describes what exists and how it behaves. Our capture names chapter documents every formatting directive Topiary recognises, what it does, its syntax and its edge cases. You’re not meant to read it cover-to-cover; it’s just there to look up whenever you need it.
Structure is a prerequisite for usefulness and frameworks exist so you don’t have to invent your own. Rolling your own ain’t gangsta, use what’s already out there…that’s real game.
When you have varied audiences
Cleaning up after the fact is one thing, but why not come correct from the start? That’s the proactive play: before you write a line of documentation, you ask who’s gonna read it and what they need. Then you build the structure around that.
So, while Topiary is a developer productivity tool with, broadly, a developer audience, the second project I want to chop it up about is different: an omics data acquisition tool for a pharmaceutical client’s computational biology needs.
This one had a whole different crowd to please. Bioinformaticians running data processing pipelines, IT staff handling installation and access control, administrators guiding users through workflows and developers who might extend the project down the line…including, potentially, yours truly, after returning from a long absence and having forgotten how everything works!
Their needs, vocabulary and assumptions barely overlap, so a single set of docs couldn’t serve them all without becoming an unfocused sprawl. So I split the documentation three ways:
- A technical manual, covering every subcommand, flag and configuration key. The kind of thing a user reaches for mid-pipeline, an administrator references when guiding colleagues, or IT consults when setting up the environment.
- A developer manual, as its mirror image: module architecture, type hierarchies, testing methodology and contribution workflow. All you need to dig the codebase, but were too shook to ask!
- A user manual sat between the two, covering key concepts, how-to guides and troubleshooting. Diátaxis was again the guiding framework here: the concepts section is explanation, the how-to guides are exactly that and the troubleshooting page addresses the practical edge cases that tripped people up during user acceptance testing.
Within the user manual, I also got to indulge in what you’ve probably
gathered is my favourite move. I weaved in a narrative through the
examples that borrowed — with some artistic licence — from Stevenson’s
Strange Case of Dr. Jekyll and Mr. Hyde: An external
collaborator’s data arrives from Dr. Jekyll’s lab, making oblique
references to the novella throughout, and ultimately identifying the
evil-transcriptome for downstream analysis.
Does this make the documentation sillier than it needs to be? Maybe. But it makes the examples stick…and that ain’t just a vibe, it’s science, dawg: our brains are straight-up wired to retain information delivered through narrative far better than through isolated facts. A reader who skimmed the manual a month ago can still go, “that’s the example where forward and reverse reads are named ‘front door’ and ‘back door’” and find the section again. The story gives continuity across otherwise disconnected examples; where each section could stand alone, the recurring characters give readers a reason to follow the arc from data acquisition to analysis. And the scenario is deliberately awkward, which exercises more features than a vanilla example ever could.
Different people need different things, that’s just how it is. A bioinformatician never needs to know how the S3 client interface is structured, just as a future developer doesn’t need a walkthrough of dataset creation from NCBI metadata. When your audiences are distinct enough, the realest thing you can do is acknowledge that up front, rather than forcing everyone to wade through what ain’t for them…and there’s never any harm in bringing a little levity into the world!
When you need to have clarity
Planning ahead is smooth, but the smoothest move of all? Making the docs and the code one and the same. That’s the integrated play.
The previous case study included a developer manual, but my final example? That’s all developer; front to back. Scrawls is a Rust library implementing a verifiable file format for Cardano ledger state, as an independent implementation alongside a Haskell reference. Its users are Rust developers pulling in the crate as a dependency, so the documentation strategy needed to reflect that.
In Rust, the idiomatic answer to this is rustdoc: in-band
documentation that lives alongside the types, functions and invariants
it describes. Then, while there is still a README.md, it functions
more as a landing page than a manual: a brief orientation, a feature
summary and a handful of examples to get a new user from zero to
something.
Docstrings ain’t new, of course — Doxygen has been around since the ’90s — but Rust’s ecosystem raises the bar. Between docs.rs publishing your crate’s documentation automatically and a community that straight-up expects thorough doc comments, skipping them feels less like a shortcut and more like showing up empty-handed. When the API changes, the respective documentation change should be right there in the diff, for reviewers to keep it real.
And here’s the thing: during implementation, the specification was still maturing and encoding its requirements into Rust exposed ambiguities that were lurking in the prose. Should certain orderings be strict? How should the Merkle tree be rolled up? Is this field optional, or merely absent? Each ambiguity became a clarification fed back into the spec. And each clarification, a documented precondition in the API.
A specification is, at the end of the day, documentation too…and the same principle applies: vagueness is a bug. This ain’t documentation as prose; it’s documentation as a contract, forged on the streets between spec and implementation. Sometimes the most valuable work you can do is keep it tight, not make it long.
Document your code, ma
Reactive. Proactive. Integrated. Three different plays for three different games…though ideally you won’t need the first! And what ties them all together is that none of the documentation I’ve described was written reluctantly. It wasn’t tacked on after the fact because someone asked, “Where are the docs?” It was thought about — its structure, its audience, its precision — as part of the work itself.
That’s the shift I want to put you on to. Documentation ain’t a tax you pay for writing code; it is part of writing code. And when you approach it that way — when you reach for a framework instead of a blank page; when you ask “who’s this for?” before you start typing; when you treat ambiguity as a bug — then you’re doc’ing it like it’s hot! The result is something you’re genuinely proud of, not something you hope nobody reads too carefully.
Now if documentation alongside code is good, then documentation before code — as a design tool; a sketch in prose before you commit to implementation — is the next level. While I ain’t taken that step myself, my Scrawls experience, where the spec and the code kept each other honest, showed me how close that workflow already is.
In practice, pure docs-first has the same problem as pure test-driven development: you can’t document what you don’t know yet. But that feedback loop — where the docs sharpen the code and the code sharpens the docs — that’s the real endgame right there. You might notice this sounds a bit like vibe-coding, and it is…in the same way that an architect’s blueprint is a bit like a napkin sketch. Same ‘hood, different zip codes, dawg. Something to aspire to, fo’ shizzle.
So now, just like my man S-to the N-to the double O-P, you too can say…
I got a Rollie on my arm and I’m pourin’ Chandon
And I write the best docs, ‘cause I got it goin’ on.
With thanks to Simeon Carstens, Facundo Domínguez, Valentin Gagarin, Xavier Gongora, Arnaud Spiwack, Snoop Dogg and Pharrell Williams for their reviews and input on this post.
Behind the scenes
Chris is a principal software engineer (Python, Rust) and editor-in-chief of the Tweag technical blog; he is also the project steward of Topiary, a universal code formatting engine. He has extensive experience building scalable data processing pipelines across diverse domains, including 7+ years at the Wellcome Sanger Institute working on genomics research infrastructure and multiple pharmaceutical projects involving omics data pipelines. He excels at writing well-tested, maintainable code, with robust DevOps to keep services reliable.
If you enjoyed this article, you might be interested in joining the Tweag team.