Adding algebraic data types to Nickel

5 September 2024 — by Yann Hamdaoui

Our Nickel language is a configuration language. It’s also a functional programming language. Functional programming isn’t a well-defined term: it can encompass anything from being vaguely able to pass functions as arguments and to call them (in that respect, C and JavaScript are functional) to being a statically typed, pure and immutable language based on the lambda-calculus, like Haskell.

However, if you ask a random developer, I can guarantee that one aspect will be mentioned every time: algebraic data types (ADTs) and pattern matching. They are the bread and butter of typed functional languages. ADTs are relatively easy to implement (for language maintainers) and easy to use. They’re part of the 20% of the complexity that makes for 80% of the joy of functional programming.

But Nickel didn’t have ADTs until recently. In this post, I’ll tell the story of Nickel and ADTs, starting from why they were initially lacking, the exploration of different possible solutions and the final design leading to the eventual retro-fitting of proper ADTs in Nickel. This post is intended for Nickel users, for people interested in configuration management, but also for anyone interested in programming language design and functional programming. It doesn’t require prior Nickel knowledge.

A quick primer on Nickel

Nickel is a gradually typed, functional, configuration language. From this point, we’ll talk about Nickel before the introduction of ADTs in the 1.5 release, unless stated otherwise. The core language features:

let-bindings: let extension = ".ncl" in "file.%{extension}"
first-class functions: let add = fun x y => x + y in add 1 2
records (JSON objects): {name = "Alice", age = 42}
static typing: let mult : Number -> Number -> Number = fun x y => x * y. By default, expressions are dynamically typed. A static type annotation makes a definition or an inline expression typechecked statically.
contracts look and act almost like types but are evaluated at runtime: { port | Port = 80 }. They are used to validate configurations against potentially complex schemas.

The lifecycle of a Nickel configuration is to be 1) written, 2) evaluated and 3) serialized, typically to JSON, YAML or TOML. An important guideline that we set first was that every native data structure (record, array, enum, etc.) should be trivially and straightforwardly serializable to JSON. In consequence, Nickel started with the JSON data model: records (objects), arrays, booleans, numbers and strings.

There’s one last primitive value: enums. As in C or in JavaScript, an enum in Nickel is just a tag. An enum value is an identifier with a leading ', such as in {protocol = 'http, server = "tweag.io"}. An enum is serialized as a string: the previous expression is exported to JSON as {"protocol": "http", "server": "tweag.io"}.

So why not just using strings? Because enums can better represent a finite set of alternatives. For example, the enum type [| 'http, 'ftp, 'sftp |] is the type of values that are either 'http, 'ftp or 'sftp. Writing protocol : [| 'http, 'ftp, 'sftp |] will statically (at typechecking time) ensure that protocol doesn’t take forbidden values such as 'https. Even without static typing, using an enum conveys to the reader that a field isn’t a free-form string.

Nickel has a match which corresponds to C or JavaScript’s switch:

is_http : [| 'http, 'ftp, 'sftp |] -> Bool =
  match {
    'http => true,
    _ => false,
  }

As you might notice, there are no ADTs in sight yet.

ADTs in a configuration language

While Nickel is a functional language, it’s first and foremost a configuration language, which comes with specific design constraints.

Because we’re telling the story of ADTs before they landed in Nickel, we can’t really use a proper Nickel syntax yet to provide examples. In what follows, we’ll use a Rust-like syntax to illustrate the examples: enum Foo<T> { Bar(i32), Baz(bool, T) } is an ADT parametrized by a generic type T with two constructors Bar and Baz, where the first one takes an integer as an argument and the other takes a pair of a boolean and a T. Concrete values are written as Bar(42) or Baz(true, "hello").

An unexpected obstacle: serialization

As said earlier, we want values to be straightforwardly serializable to the JSON data model.

Now, take a simple ADT such as enum Foo<T,U> = { SomePair(T,U), Nothing }. You can find reasonable serializations for SomePair(1,2), such as {"tag": "SomePair", "a": 1, "b": 2}. But why not {"flag": "SomePair", "0": 1, "1": 2} or {"mark": "SomePair", "data": [1, 2]}? While those representations are isomorphic, it’s hard to know the right choice for the right use-case beforehand, as it depends on the consumer of the resulting JSON. We really don’t want to make an arbitrary choice on behalf of the user.

Additionally, while ADTs are natural for a classical typed functional language, they might not entirely fit the configuration space. A datatype like enum Literal { String(String), Number(Number) } that can store either a string or a number is usually represented directly as an untagged union in a configuration, that {"literal": 5} or {"literal": "hello"}, instead of the less natural tagged union (another name for ADTs) {"literal": {"tag": "Number", "value": 5}}.

This led us to look at (untagged) union types instead. Untagged unions have the advantage of not making any choice about the serialization: they aren’t a new data structure, as are ADTs, but rather new types (and contracts) to classify values that are already representable.

The road of union types

A union type is a type that accepts different alternatives. We’ll use the fictitious \/ type combinator to write a union in Nickel (| is commonly used elswhere but it’s already taken in Nickel). Our previous example of a literal that can be either a string or a number would be {literal: Number \/ String}. Those types are broadly useful independently of ADTs. For example, JSON Schema features unions through the core combinator any_of.

Our hope was to kill two birds with one stone by adding unions both as a way to better represent existing configuration schemas, but also as a way to emulate ADTs. Using unions lets users represent ADTs directly as plain records using their preferred serialization scheme. Together with flow-sensitive typing, we can get as expressive as ADTs while letting the user decide on the encoding. Here is an example in a hypothetical Nickel enhanced with unions and flow-sensitive typing:

let sum
  : {tag = 'SomePair, a : Number, b : Number} \/ {tag = 'Nothing}
    -> Number
  = match {
    {tag = 'SomePair, a, b} => a + b,
    {tag = 'Nothing} => 0,
  }

Using unions and flow-sensitive typing as ADTs is the approach taken by TypeScript, where the previous example would be:

type Foo = { tag: "SomePair"; a: number; b: number } | { tag: "Nothing" }

function sum(value: Foo): number {
  switch (value.tag) {
    case "SomePair":
      return value.a + value.b
    case "Nothing":
      return 0
  }
}

In Nickel, any type must have a contract counter-part. Alas union and intersection contracts are hard (in fact, union types alone are also not a trivial feat to implement!). In the linked blog post, we hint at possible pragmatic solutions for union contracts that we finally got to implement for Nickel 1.8. While sufficient for practical union contracts, this is far from the general union types that could subsume ADTs. This puts a serious stop to the idea of using union types to represent ADTs.

What are ADTs really good for?

As we have been writing more and more Nickel, we realized that we have been missing ADTs a lot for library functions - typically the types enum Option<T> { Some(T), None } and Result<T,E> = { Ok(T), Error(E) } - where we don’t care about serialization. Those ADTs are “internal” markers that wouldn’t leak out to the final exported configuration.

Here are a few motivating use-cases.

`std.string.find`

std.string.find is a function that searches for a substring in a string. Its current type is:

String
-> String
-> { matched : String, index : Number, groups : Array String }

If the substring isn’t found, {matched = "", index = -1, groups []} is returned, which is error-prone if the consumer doesn’t defend against such values. We would like to return a proper ADT instead, such as Found {matched : String, index : Number, groups : Array String} or NotFound, which would make for a better and a safer interface¹.

Contract definition

Contracts are a powerful validation system in Nickel. The ability to plug in your own custom contracts is crucial.

However, the general interface to define custom contracts can seem bizarre. Custom contracts need to set error reporting data on a special label value and use the exception-throwing-like std.contract.blame function. Here is a simplified definition of std.number.Nat which checks that a value is natural number:

fun label value =>
  if std.typeof value == 'Number then
    if value % 1 == 0 && value >= 0 then
      value
    else
      let label = std.contract.label.with_message "not a natural" in
      std.contract.blame label
  else
    let label = std.contract.label.with_message "not a number" in
    std.contract.blame label

There are good (and bad) reasons for this situation, but if we had ADTs, we could cover most cases with an alternative interface where custom contracts return a Result<T,E>, which is simpler and more natural:

fun value =>
  if std.typeof value == 'Number then
    if value % 1 == 0 && value >= 0 then
      Ok
    else
      Error("not a natural")
  else
    Error("not a number")

Of course, we could just encode this using a record, but it’s just not as nice.

Let it go, let it go!

The list of other examples of using ADTs to make libraries nicer is endless.

Thus, for the first time, we decided to introduce a native data structure that isn’t serializable.

Note that this doesn’t break any existing code and is forward-compatible with making ADTs serializable in the future, should we change our mind and settle on one particular encoding. Besides, another feature is independently explored to make serialization more customizable through metadata, which would let users use custom (de)serializer for ADTs easily.

Ok, let’s add the good old-fashioned ADTs to Nickel!

The design

Structural vs nominal

In fact, we won’t exactly add the old-fashioned version. ADTs are traditionally implemented in their nominal form.

A nominal type system (such as C, Rust, Haskell, Java, etc.) decides if two types are equal based on their name and definition. For example, values of enum Alias1 { Value(String) } and enum Alias2 { Value(String) } are entirely interchangeable in practice, but Rust still doesn’t accept Alias1::Value(s) where a Alias2 is expected, because those types have distinct definitions. Similarly, you can’t swap a class for another in Java just because they have exactly the same fields and methods.

A structural type system, on the other hand, only cares about the shape of data. TypeScript has a structural type system. For example, the types interface Ball { diameter: number; } and interface Sphere { diameter: number; } are entirely interchangeable, and {diameter: 42} is both a Ball and a Sphere. Some languages, like OCaml² or Go³, mix both.

Nickel’s current type system is structural because it’s better equipped to handle arbitrary JSON-like data. Because ADTs aren’t serializable, this consideration doesn’t weight as much for our motivating use-cases, meaning ADTs could still be either nominal or structural.

However, nominal types aren’t really usable without some way of exporting and importing type definitions, which Nickel currently lacks. Structural ADTs look like the better choice for Nickel. We can build, typecheck, and match on ADTs locally without having to know or to care about any type declaration. Structural ADTs are a natural extension of Nickel (structural) enums, syntactically, semantically, and on the type level, as we will see.

While less common, structural ADTs do exist in the wild and they are pretty cool. OCaml has both nominal ADTs and structural ADTs, the latter being known as polymorphic variants. They are an especially powerful way to represent a non trivial hierarchy of data types with overlapping, such as abstract syntax trees or sets of error values.

Syntax

C-style enums are just a special case of ADTs, namely ADTs where constructors don’t have any argument. The dual conclusion is that ADTs are enums with arguments. We thus write the ADT Some("hello") as an enum with an argument in Nickel: 'Some "hello".

We apply the same treatment to types. [| 'Some, 'None |] was a valid enum type, and now [| 'Some String, 'None |] is also a valid type (which would correspond to Rust’s Option<String>).

There is a subtlety here: what should be the type inferred for 'Some now? In a structural type system, 'Some is just a free-standing symbol. The typechecker can’t know if it’s a constant that will stay as it is - and thus has the type [| 'Some |] - or a constructor that will be eventually applied, of type a -> [| 'Some a |]. This difficulty just doesn’t exist in a nominal type system like Rust: there, Option::Some refers to a unique, fixed ADT constructor that is known to require precisely one argument.

To make it work, 'Ok 42 isn’t actually a normal function application in Nickel: it’s an ADT constructor application, and it’s parsed differently. We just repurpose the function application syntax⁴ in this special case. 'Ok isn’t a function, and let x = 'Ok in x 42 is an error (applying something that isn’t a function).

You can still recover Rust-style constructors that can be applied by defining a function (eta-expanding, in the functional jargon): let ok = fun x => 'Ok x.

We restrict ADTs to a single argument. You can use a record to emulate multiple arguments: 'Point {x = 1, y = 2}.

ADTs also come with pattern matching. The basic switch that was match is now a powerful pattern matching construct, with support for ADTs but also arrays, records, constant, wildcards, or-patterns and guards (if side-conditions).

Typechecking

Typechecking structural ADTs is a bit different from nominal ADTs. Take the simple example (the enclosing : _ annotation is required to make the example statically typed in Nickel)

(
  let data = 'Ok 42 in
  let process = match {
    'Ok x => x + 1,
    'Error => 0,
  } in

  process data
) :  _

One possible solution is to introduce subtyping, which is able to express this kind of inclusion relation: here that [| 'Ok Number |] is included in [| 'Ok Number, 'Error |]. However, subtyping has some defects and is whole can of worms when mixed with polymorphism (which Nickel has).

Nickel rather relies on another approach called row polymorphism, which is the ability to abstract over not just a type, as in classical polymorphism, but a whole piece of an enum type. Row polymorphism is well studied in the literature, and is for example implemented in PureScript. Nickel already features row polymorphism for basic enum types and for records types.

Here is how it works:

let process : forall a. [| 'Ok Number, 'Error; a |] -> Number = match {
  'Ok x => x + 1,
  'Error => 0,
  _ => -1,
} in

process 'Other

Because there’s a catch-all case _ => -1, the type of process is polymorphic, expressing that it can handle any other variant beside 'Ok Number and 'Error (this isn’t entirely true: Ok String is forbidden for example, because it can’t be distinguished from Ok Number). Here, a can be substituted for a subsequence of an enum type, such as 'Foo Bool, 'Bar {x : Number}.

Equipped with row polymorphism, we can infer the type forall a. [| 'Ok Number; a |]⁵ for 'Ok 42. When typechecking process data in the original example, a will be instantiated to the single row 'Error and the example typechecks. You can learn more about structural ADTs and row polymorphism in the corresponding section of the Nickel user manual.

Conclusion

While ADTs are part of the basic package of functional languages, Nickel didn’t have them until relatively recently because of peculiarities of the design of a configuration language. After exploring the route of union types, which came to a dead-end, we settled on a structural version of ADTs that turns out to be a natural extension of the language and didn’t require too much new syntax or concepts.

ADTs already prove useful to write cleaner and more concise code, and to improve the interface of libraries, even in a gradually typed configuration language. Some concrete usages can be found in try_fold_left and validators already.

Unfortunately, we can’t change the type of std.string.find without breaking existing programs (at least not until a Nickel 2.0), but this use-case still applies to external libraries or future stdlib functions↩
In OCaml, Objects, polymorphic variants and modules are structural while records and ADTs are nominal.↩
In Go, interfaces are structural while structs are nominal.↩
Repurposing application is theoretically backward incompatible because 'Ok 42 was already valid Nickel syntax before 1.5, but it was meaningless (an enum applied to a constant) and would always error out at runtime, so it’s ok.↩
In practice, we infer a simpler type [| 'Ok Number; ?a |] where ?a is a unification variable which can still have limitations. Interestingly, we decided early on to not perform automatic generalization, as opposed to the ML tradition, for reasons similar to the ones exposed here. Doing so, we get (predicative) higher-rank polymorphism almost for free, while it’s otherwise quite tricky to combine with automatic generalization. It turned out to pay off in the case of structural ADTs, because it makes it possible to side-step those usual enum types inclusion issues (widening) by having the user add more polymorphic annotations. Or we could even actually infer the polymorphic type forall a. [| 'Ok Number; a |] for literals.↩

Behind the scenes

Yann Hamdaoui

Yann is the head of the Programming Languages & Compiler group at Tweag. He's also leading the development of the Nickel programming language, a next-generation typed configuration language designed to manage the growing complexity of Infrastructure-as-Code and a candidate successor for the Nix language. You might also find him doing Nix or any other trickery to fight against non-reproducible and slow builds or CI.

Tech Group

Nix

Sustainable software development with Nix

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

Tech Group

Scalable Builds

Correct, efficient, and reliable builds are critical for developers to work and collaborate effectively.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Deploying Buildbarn on Kubernetes with mTLS on the side Reflecting away from definitions in Liquid Haskell →