Tweag
Technical groups
Dropdown arrow
Open source
Careers
Research
Blog
Contact
Consulting services
Technical groups
Dropdown arrow
Open source
Careers
Research
Blog
Contact
Consulting services

Announcing Topiary

9 March 2023 — by Erin van der Veen, Nicolas Bacquey, Guillaume Genestier, Christopher Harrison, , Tor Hovland

Topiary aims to be a universal formatter engine within the Tree-sitter ecosystem. Named after the art of clipping or trimming trees into fantastic shapes, it is designed for formatter authors and formatter users:

  • Authors can create a formatter for a language without having to write their own formatting engine, or even their own parser.

  • Users benefit from uniform, comparable code style, across multiple languages, with the convenience of a single formatter tool.

The core of Topiary is written in Rust, with declarative formatting rules for bundled languages written in the Tree-sitter query language. In this first release, we have concentrated on formatting OCaml code, capitalising on the OCaml expertise within the Topiary Team and our colleague, Nicolas Jeannerod.

All development and releases happen over in the Topiary GitHub repository.

Topiary logo

Motivation

Coding style has historically been a matter of personal choice. This is inherently subjective, leading to bikeshedding over formatting choices, rather than meaningful discussion during review. Prescribed style guides, linters and ultimately automatic formatters — popularised by gofmt, whose developers had the insight to impose “good enough” uniform formatting on a codebase — have helped solve these issues.

This motivated research into developing a formatter for our Nickel language. However, its internal parser did not provide a syntax tree that retained enough context to allow the original program to be reconstructed after parsing. After creating a Tree-sitter grammar for Nickel, for syntax highlighting, we concluded that it would be possible to leverage Tree-sitter for formatting as well.

But why stop at Nickel? Topiary generalises this approach for any language that doesn’t employ semantic whitespace — for which, specialised formatters, such as our Haskell formatter Ormolu, are required — by expressing formatting style rules in the Tree-sitter query language. It thus aspires to be a “universal formatter engine” for such languages; enabling the fast development of formatters, provided a Tree-sitter grammar is available.

Design Principles

To that end, Topiary has been created with the following goals in mind:

  • Use Tree-sitter for parsing, to avoid writing yet another engine for a formatter.
  • Expect idempotency. That is, formatting of already-formatted code shouldn’t change anything.
  • For bundled formatting styles to meet the following constraints:
    • Compatible with attested formatting styles used for that language in the wild.
    • Faithful to the author’s intent: if code has been written such that it spans multiple lines, that decision is preserved.
    • Minimise changes between commits such that diffs focus mainly on the code that’s changed, rather than superficial artefacts.
    • Be well-tested and robust, such that they can be trusted on large projects.
  • For end users, the formatter should run efficiently and integrate with other developer tools, such as editors and language servers.

How it Works

As long as a Tree-sitter grammar is defined for a language, Tree-sitter can parse it and build a concrete syntax tree. Tree-sitter also allows us to run queries against this tree. We can make use of these to target interesting subtrees (e.g., an if block or a loop), to which we can apply formatting rules. These cohere into a declarative definition of how that language should be formatted.

For example:

(
  [
    (infix_operator)
    "if"
    ":"
  ] @append_space
  .
  (_)
)

This will match any node that the grammar has identified as an infix_operator, or the anonymous nodes containing if or : tokens, immediately followed by any named node (represented by the (_) wildcard pattern). The query matches on subtrees of the same shape, where the annotated node within it will be “captured” with the name @append_space; one of many formatting rules we have defined. Our formatter runs through all matches and captures, and when we process any capture called @append_space, we append a space after the annotated node.

Before rendering the output, Topiary does some post-processing, such as squashing consecutive spaces and newlines, trimming extraneous whitespace, and ordering indentation and newline instructions consistently. This means that you can, for example, prepend and append spaces to if and true, and Topiary will still output if true with just one space between the words.

To make this more concrete, consider the expression 1+2. This has the following syntax tree, if it’s interpreted as OCaml, where the match described by the above query is highlighted in red:

Syntax tree, with the match highlighted

The @append_space capture instructs Topiary to append a space after the infix_operator, rendering 1+ 2. Repeating this process for every syntactic structure we care about — making judicious generalisations wherever possible — leads us to an overall formatting style for a language.

As a formatter author, defining a style for a language is just a matter of building up these queries. End users can then apply them to their codebase with Topiary, to render their code in this style.

Topiary is not the first tool to use Tree-sitter beyond its original scope, nor is it the first tool that attempts to be a formatter for multiple languages (e.g., Prettier). This section contains some tools that we drew inspiration from, or used during the development of Topiary.

Tree-sitter Specific

Meta-Formatters

  • treefmt: A general formatter orchestrator, which unifies formatters under a common interface.
  • format-all: A formatter orchestrator for Emacs.
  • null-ls.nvim: An LSP framework for Neovim that facilitates formatter orchestration.

Getting Started

We’re really excited about Topiary and the potential it has in this space.

This first release concentrates on formatting support for OCaml, as well as simple languages, such as JSON and TOML. Experimental formatting support is also available for Nickel, Bash, Rust, and Tree-sitter’s own query language; these are under active development or serve a pedagogical end for formatter authors.

We would highly encourage you to try Topiary and invite you to check out the Topiary GitHub repository to see for yourself. Information on installing and using Topiary can be found in this repository, where we would also welcome contributions, feature requests, and bug reports.

About the authors

Erin van der Veen

A software engineer with experience in creating web-based functional programs, maintaining a functional programming language and designing and implementing tools enhancing developer productivity. Their personal goal is to get as many people as possible working with functional programming languages.

Nicolas Bacquey

Nicolas is a Software Engineer who works on design, implementation and maintenance of micro-services. Before joining Tweag, he worked in academia, where he studied automata of many sorts (be they cellular, graph-, or tree-). He has a PhD and a MSc in computer science, from Université de Caen Normandie

Guillaume Genestier

Christopher Harrison

Chris is a recovering mathematician and software engineer at Tweag. He has spent his career working for academia, from both the business and the research sides of the industry. He particularly enjoys writing well-tested, maintainable code that serves a pragmatic end, with a side-helping of DevOps to keep services ticking with minimal fuss.

Tor Hovland

Tor is a Rust developer at Tweag who lives in Trondheim, Norway with his wife and two sons.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

Company

AboutOpen SourceCareersContact Us

Connect with us

© 2024 Modus Create, LLC

Privacy PolicySitemap