Single-line and multi-line formatting with Topiary

2 October 2025 — by Yann Hamdaoui

Writing a formatter has never been so easy: a Topiary tutorial
Single-line and multi-line formatting with Topiary

In a previous post, I introduced Topiary, a universal formatter (or one could say a formatter generator), and showed how to start a formatter for a programming language from scratch. This post is the second part of the tutorial, where we’ll explore more advanced features of Topiary that come in handy when handling real-life languages, and in particular the single-line and multi-line layouts. I’ll assume that you have a working setup to format our toy Yolo language. If you don’t, please follow the relevant sections of the previous post first.

Single-line and multi-line

A fundamental tenet of formatting is that you want to lay code out in different ways depending on if it fits on one line or not. For example, in Nickel, or any functional programming language for that matter, it’s idiomatic to write small anonymous functions on one line, as in std.array.map (fun x => x * 2 + 1) [1,2,3]. But longer functions would rather look like:

fun x y z =>
  if x then
    y
  else
    z

This is true for almost any language construct that you can think of: you’d write a small boolean condition is_a && is_b, but write a long validation expressions as:

std.is_string value
&& std.string.length value > 5
&& std.string.length value < 10
&& !(std.string.is_match "\\d" value)

In Rust, with rustfmt, short method calls are formatted on one line as in x.clone().unwrap().into(), but they are spread over several lines when the line length is over a fixed threshold:

value
    .maybe_do_something(|x| x+1)
    .or_something_else(|_| Err(()))
    .into_iter()

You usually either want the single-line layout or the multi-line one. A hybrid solution wouldn’t be very consistent:

std.is_string value
&& std.string.length value > 5 && std.string.length value < 10
&& !(std.string.is_match "\\d" value)

Some formatters, such as Rust’s, choose the layout automatically depending on the length of the line. Long lines are wrapped and laid out in the multi-line style automatically, freeing the programmer from any micro decision. On the flip side, the programmer can’t force one style in cases where it’d make more sense.

Some other formatters, like our own Ormolu for Haskell, decide on the layout based on the original source code. For any syntactic construct, the programmer has two options:

Write it on one line, or
Write it on two lines or more.

1. will trigger the single-line layout, and 2. the multi-line one. No effort is made to try to fit within reasonable line lengths. That’s up to the programmer.

As we will see, Topiary follows the same approach as Ormolu, although future support for optional line wrapping isn’t off the table¹.

Softlines

Less line breaks, please

Let’s see how our Yolo formatter handles the following source:

input income, status
output income_tax

income_tax := case { status = "exempted" => 0, _ => income * 0.2 }

Since the case is short, we want to keep it single-line. Alas, this gets formatted as:

input income, status
output income_tax

income_tax := case {
  status = "exempted" => 0,
  _ => income * 0.2
}

The simplest mechanism for multi-line-aware layout is to use soft lines instead of spaces or hardlines. Let’s change the @append_hardline capture in the case branches separating rule to @append_spaced_softline:

; Put case branches on their own lines
(case
  "," @append_spaced_softline
)

As the name indicates, a spaced softline will result in a space for the single-line case, and a line break for the multi-line case, which is precisely what we want. However, if we try to format our example, we get the dreaded idempotency check failure, meaning that formatting one time or two times in a row doesn’t give the same result, which is a usually a red flag (and is why Topiary performs this check). What happens is that our braces { and } also introduce hardlines, so the double formatting goes like:

income_tax := case { status = "exempted" => 0, _ => income * 0.2 }

--> (case is single-line: @append_spaced_softline is a space)
income_tax := case {
  status = "exempted" => 0, _ => income * 0.2
}
--> (case is multi-line! @append_spaced_softline is a line break)
income_tax := case {
  status = "exempted" => 0,
  _ => income * 0.2
}

We need to amend the rule for braces as well:

; Lay out the case skeleton
(case
  "{" @prepend_space @append_spaced_softline
  "}" @prepend_spaced_sofline
)

Our original example is now left untouched, as desired. Note that softline annotations are expanded depending on the multi-lineness of the direct parent of the node they attach to (and neither the subtree matched by the whole query nor the node itself). Topiary applies this logic because this is most often what you want. The parse tree of the multi-line version of income_tax:

income_tax := case {
  status = "exempted" => 0,
  _ => income * 0.2
}

is as follows (hiding irrelevant parts in [...]):

0:0  - 4:0    tax_rule
0:0  - 3:1      statement
0:0  - 3:1        definition_statement
0:0  - 0:10         identifier `income_tax`
0:11 - 0:13         ":="
0:14 - 3:1          expression
0:14 - 3:1            case
0:14 - 0:18             "case"
0:19 - 0:20             "{"
1:2  - 1:26             case_branch
                        [...]
1:26 - 1:27             ","
2:2  - 2:19             case_branch
                        [...]
3:0  - 3:1              "}"

The left part is the span of the node, in the format start_line:start_column - end_line:end_column. A node is multiline simply if end_line > start_line. You can see that since "{" is not multiline (it can’t be, as it’s only one character!), if Topiary considered the multi-lineness of the node itself, our previous "{" @append_spaced_softline would always act as a space.

What happens is that Topiary considers the direct parent instead, which is 0:14 - 3:1 case here, and is indeed multi-line.

Both single-line and multi-line case are now formatted as expected.

More line breaks, please

Let’s consider the dual issue, where line breaks are unduly removed. We’d like to allow inputs and outputs to span multiple lines, but the following snippet:

input
  income,
  status,
  tax_coefficient
output income_tax

is formatted as:

input income, status, tax_coefficient
output income_tax

The rule for spacing around input and output and the rule for spacing around , and identifiers both use @append_space. We can simply replace this with a spaced softline. Recall that a spaced softline turns into a space and thus behaves like @append_space in a single-line context, making it a proper substitution.

; Add spaced softline after `input` and `output` decl
[
  "input"
  "output"
] @append_spaced_softline


; Add a spaced softline after and remove space before the comma in an identifier
; list
(
  (identifier)
  .
  "," @prepend_antispace @append_spaced_softline
  .
  (identifier)
)

We also need to add new rules to indent multi-line lists of inputs or outputs.

; Indent multi-line lists of inputs.
(input_statement
  "input" @append_indent_start
) @append_indent_end

; Indent multi-line lists of outputs.
(output_statement
  "output" @append_indent_start
) @append_indent_end

A matching pair of indentation captures *_indent_start and *_indent_end will amount to a no-op if they are on the same line, so those rules don’t disturb the single-line layout.

Recall that as long as you don’t use anchors (.), additional nodes can be omitted from a Tree-sitter query: here, the first query will match an input statement with an "input" child somewhere, and any children before or after that (although in our case, there won’t be any children before).

Scopes

More (scoped) line breaks, please

Let us now consider a similar example, at least on the surface. We want to allow long arithmetic expressions to be laid out on multiple lines as well, as in:

input
  some_long_name,
  other_long_name,
  and_another_one
output result

result :=
  some_long_name
  + other_long_name
  + and_another_one

As before, result is currently smashed back into one line by our current formatter. Unsurprisingly, since our keywords rule uses @prepend_space and @append_space. At this point, you start to get the trick: let’s use softlines! I’ll only handle + for simplicity. We remove "+" from the original keywords rule and add the following rule:

; (Multi-line) spacing around +
("+" @prepend_spaced_softline @append_space)

Ignoring indentation for now, the line wrapping seems to work. For the following example at least:

result :=
  some_long_name
  + other_long_name + and_another_one

which is reformatted as:

result := some_long_name
+ other_long_name
+ and_another_one

However, perhaps surprisingly, the following example:

result :=
some_long_name + other_long_name
+ and_another_one

is reformatted as:

result := some_long_name + other_long_name
+ and_another_one

The first addition hasn’t been split! To understand why, we have to look at how our grammar parses arithmetic expressions:

expression: $ => choice(
  $.identifier,
  $.number,
  $.string,
  $.arithmetic_expr,
  $.case,
),

arithmetic_expr: $ => choice(
  prec.left(1, seq(
    $.expression,
    choice('+', '-'),
    $.expression,
  )),
  prec.left(2, seq(
    $.expression,
    choice('*', '/'),
    $.expression,
  )),
  prec(3, seq(
    '(',
    $.expression,
    ')',
  )),
),

Even if you don’t understand everything, there are two important points:

Arithmetic expressions are recursively nested. Indeed, we can compose arbitrarily complex expressions, as in (foo*2 + 1) + (bar / 4 * 6).
They are parsed in a left-associative way.

This means that our big addition is parsed as: ((some_long_name "+" other_long_name) "+" and_another_one). In the first example, since the line break happens just after some_long_name in the original source, both the inner node and the outer one are multi-line. However, in the second example, the line break happens after other_long_name, meaning that the innermost arithmetic expression is contained in a single line, and the corresponding + isn’t considered multi-line. Indeed, you can see here that the parent of the first + is 7:0 - 7:32 arithmetic_expr, which fits entirely on line 7.

7:0  - 8:17           arithmetic_expr
7:0  - 7:32             expression
7:0  - 7:32               arithmetic_expr
7:0  - 7:14                 expression
7:0  - 7:14                   identifier `some_long_name`
7:15 - 7:16                 "+"
7:17 - 7:32                 expression
7:17 - 7:32                   identifier `other_long_name`
8:0  - 8:1              "+"
8:2  - 8:17             expression
8:2  - 8:17               identifier `and_another_one`

The solution here is to use scopes. A scope is a user-defined group of nodes associated with an identifier. Crucially, when using scoped softline captures such as @append_scoped_space_softline within a scope, Topiary will consider the multi-lineness of the whole scope instead of the multi-lineness of the (parent) node.

Let’s create a scope for all the nested sub-expressions of an arithmetic expression. Scopes work the same as other node groups in Topiary: we create them by using a matching pair of begin and end captures. We need to find a parent node that can’t occur recursively in an arithmetic expression. A good candidate would be definition_statement, which encompasses the whole right-hand side of the definition of an output:

; Creates a scope for the whole right-hand side of a definition statement
(definition_statement
  (#scope_id! "definition_rhs")
  ":="
  (expression) @prepend_begin_scope @append_end_scope
)

We must specify an identifier for the scope using the predicate scope_id. Identifiers are useful when several scopes might be nested or even overlap, and help readability in general.

We then amend our initial attempt at formatting multi-line arithmetic expressions:

; (Multi-line) spacing around +
(
  (#scope_id! "definition_rhs")
  "+" @prepend_scoped_spaced_softline @append_space
)

We use a scoped version of softlines, in which case we need to specify the identifier of the corresponding scope. The captured node must also be part of said scope. You can check that both examples (and multiple variations of them) are finally formatted as expected.

Conclusion

This second part of the Topiary tutorial has taught how to finely specify an alternative formatting layout depending on whether an expression spans multiple lines or not. The main concepts at play here are multi-line versus single-line nodes, and scopes. There is an extension to this concept not covered here, measuring scopes, but standard scopes already go a long way for formatting a real life language. If you’re looking for a comprehensive resource to help you write your formatter, the official Topiary book is for you. You can however find the complete code for this post in the companion repository. Happy hacking!

See #700 ↩

Behind the scenes

Yann Hamdaoui

Yann is the head of the Programming Languages & Compiler group at Tweag. He's also leading the development of the Nickel programming language, a next-generation typed configuration language designed to manage the growing complexity of Infrastructure-as-Code and a candidate successor for the Nix language. You might also find him doing Nix or any other trickery to fight against non-reproducible and slow builds or CI.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.