Writing static checks to an unsuspecting library with Liquid Haskell

11 June 2026 — by Xavier Góngora

This post presents a little epic to insert static checks in Haskell’s Diff package using Liquid Haskell (LH).¹ Static or compile-time checks are helpful to confirm formerly implicit assumptions in the implementation, providing an additional layer of assurance.

Making illegal states unrepresentable at an affordable cognitive cost is a staple of statically typed functional programming. Endeavors like Dependent Haskell and Liquid Haskell delve into this aspect. A distinctive feature of LH is that it works on top of regular Haskell code, meaning that the program can still be compiled after disabling it, thus making it possible to enforce properties without changing the source code. In what follows I’ll give you a glimpse of how the Liquid Haskell approach feels in practice and how far it can go.

Liquid Haskell was created by the UCSD Programming Systems group and these days is mainly maintained and further improved by my colleague Facundo Domínguez. Applying Liquid Haskell to strengthen libraries has precedent in the Haskell ecosystem, and it was in this spirit that Facundo suggested this project as we were pondering an attempt to statically check our in-house Ormolu, of which Diff is a transitive dependency and a more suitable commitment given the engineering time I could bestow upon it.²

`Diff` will never be the same

The Diff package is a small and (relatively) self-contained library implementing the Myers diff algorithm. As a provider of basic functionality in the Haskell ecosystem, adding formal guarantees to it is of intrinsic value to the community.

From the get-go, my objective was adding static checks to strengthen this library in a contribution guided by two opposing desiderata:

Minimize source changes
Maximize checked invariants

While the first is about testing how (non-)intrusive Liquid Haskell can be, the second is about its expressiveness. To put it bluntly, the ideal LH would be able to statically check all the existing invariants of an unsuspecting library using nothing more than specification annotations. Reality is not that kind, forcing me to compromise on both objectives, but I kept this mindset to help me see how close LH is to this ideal.

My first milestone was filling the mind gap between the Diff implementation and the referenced paper’s algorithm, through an in-depth study of the library, resulting in documentation contributions highlighting the most salient invariants (pre- and post-conditions) and assumptions.³

In general, it is by a careful threading of logic that a program is built into existence; the problem (and the source of well engineered solutions) is that the critical aspects of it lie within a theory in its writer’s mind, which tends to be lost across iterations, updates, refactors and people moving on. Both documentation and specification cannot completely solve this problem, but they can help.

For example, I added a post-condition to this function haddock

data PolyDiff a b = First a | Second b | Both a b

-- | Like 'getGroupedDiff' but accepts a custom equality predicate.
--
-- Postcondition: the output list is guaranteed to be /chunked/. i.e. no two adjacent
-- elements share the same constructor.
getGroupedDiffBy :: (a -> b -> Bool) -> [a] -> [b] -> [PolyDiff [a] [b]]

making the expected form of its output explicit. This allows a reader to get an immediate notion of what the implementation is supposed to accomplish in order to satisfy the caller’s expectations.

Similarly, data types often carry more meaning than what they actually encode, in which case documenting the implicit assumptions can help understand their intended use.

-- | Line Range: start, end and contents.
--
-- The following invariants hold:
--
-- > snd lrNumbers >= fst lrNumbers
-- > snd lrNumbers - fst lrNumbers + 1 == length lrContents
--
-- which imply @lrContents@ cannot be empty.
data LineRange = LineRange { lrNumbers :: (LineNo, LineNo)
                           , lrContents :: [String]
                           }

These haddocks are inspired by the kind of properties that LH can express. Nevertheless, their value doesn’t depend on providing static checks as they already save us from some arduous code path diving. Wouldn’t it be wonderful if the compiler could take those haddocks to heart? In a sense that’s what LH is about!

Engineering the static checks took me into a tight feedback loop between the documentation process, coming up with refactorings⁴ to make the code easier to check (which always implied easier to explain!) and the writing of LH specifications matching the documented invariants. This approach is in close sympathy with the doc it like it’s hot philosophy.

From dry code to liquid types

After installing LH, compilation failed due to new shiny errors, even though I hadn’t written a single LH specification yet. This is because LH inspects the bodies of all function definitions out of the box to prove that

Existing specifications are fulfilled
Recursive functions terminate

The first condition is not limited to local specifications; LH comes bundled with specifications for many boot package functions. For instance, many of Prelude’s partial functions are refined this way to be total, so LH tries to prove that all their uses are safe.

One prominent example is head, which was the only failure of the first condition in Diff: it was not certain that the list passed to head in its ses function is always non-empty, which can be found to be true from the algorithm specification and by following the composition of the involved processes. LH tries to build this knowledge from specifications, in the form of refinement types, found along the call stack. Such specifications are introduced using a special comment syntax {-@ ... @-} whose contents are processed to generate a set of constraints for an external SMT solver to verify. This allows us to mechanically check function specifications, formed out of pre- and post-conditions, and data invariants expressed as simple logical predicates at compile time. In what follows I’ll show some examples of LH specification annotations, but in most cases I won’t be explaining their syntax or fundamentals, trusting that their meaning within the general argument can be gathered from context. For further details please look at the spec reference documentation.

The same comment syntax is also used to set LH directives, like the ignore annotation I used to skip checks in the body of the offending ses function.

{-@ ignore ses @-}
ses :: (a -> b -> Bool) -> [a] -> [b] -> [DI]
ses eq as bs = path . head . dropWhile (\dl -> poi dl /= lena || poj dl /= lenb) .
            concat . iterate (dstep cd) . (:[]) . addsnake cd $
            DL {poi=0,poj=0,path=[]}
            where cd = canDiag eq as bs lena lenb
                  lena = length as; lenb = length bs

Turning now to the second condition: To prove termination of a recursive function, LH needs to be told of a size reduced towards a lower bound at each recursive call. This is called a termination metric.

Some recursive functions might be proved terminating without intervention, because when no explicit metric is given LH follows a simple heuristic: it checks for the first (non-function) argument with an associated size metric to be strictly decreasing and non-negative at each recursive call. LH has definitions of associated size metric for lists (their length) and integer values, which are considered metrics themselves when non-negative. Metrics get interesting when we have mutually recursive functions, as is the case for doPrefix and doSuffix, a pair of local functions whose job is to chop common lines of input to create the context windows that make a diff’s hunks. I introduced a lexicographic metric, annotated with the syntax / [metric1, metric2, ...] at the end of a function refinement, to prove their termination:

type Diff c = PolyDiff c c

{-@ doPrefix :: hunk : [Diff [c]] -> [Diff [c]] / [len hunk, 0] @-}
doPrefix :: [Diff [c]] -> [Diff [c]]
doPrefix [] = []
doPrefix [Both _ _] = []
doPrefix (Both xs ys : more) =
  Both (drop (length xs - contextSize) xs)
       (drop (length ys - contextSize) ys)
    : doSuffix more
doPrefix (d : ds) = d : doSuffix ds

{-@ doSuffix :: hunk : [Diff [c]] -> [Diff [c]] / [len hunk, 1]@-}
doSuffix :: [Diff [c]] -> [Diff [c]]
doSuffix [] = []
doSuffix [Both xs ys] = [Both (take contextSize xs) (take contextSize ys)]
doSuffix (Both xs ys : more)
  | length xs <= contextSize * 2 = Both xs ys : doPrefix more
  | otherwise =
      Both (take contextSize xs) (take contextSize ys) :
        doPrefix (Both (drop contextSize xs) (drop contextSize ys) : more)
doSuffix (d : ds) = d : doSuffix ds

Using this metric LH checks that either the input hunk (a list of diff elements) length is reduced after each recursive call, as it would do by its default heuristic, or considers a call to doPrefix (0) from doSuffix (1) to be a strict reduction. This second fallback metric is needed because of the third equation of doSuffix (second guard), where doPrefix is called with a list of equal length. Apart from this case, each (mutually) recursive call is done on the tail of the input and thus strictly decreasing.

Here I’ve presented instances of two general strategies to handle LH errors:

Fight: Fix the failing termination checks by introducing metrics and offending functions calls by adding specifications.
Flight: Disable checks by using an escape hatch, e.g. the {-@ lazy myRecursiveFunction @-} annotation to circumvent termination checking, the {-@ ignore myOffendingFunction @-} to disable all checks within a function’s body or the {-@ assume myFunction :: ... spec ... @-} to set a function specification as true without verification.

A priori it’s desirable to minimize the use of escape hatches, but they’re also tools to prioritize static checking efforts.

Invariant static checking

One thing that made Diff particularly suitable for this effort is that a detailed specification of it existed in the form of a research paper. Indeed, my first documentation contribution was making their connection explicit throughout. The Myers diff algorithm can be summarized as a breadth-first search for the shortest path across a bidimensional edit grid to an endpoint,⁵ the latter representing the complete transformation of one input to the other. The algorithm is in fact tersely expressed in the ses definition presented before; its name stands for “smallest edit script”, which is one of the output characterizations of the diff algorithm. What I found is that the idea of a wave front is the link between this implementation and the original algorithm. This statement is now supported by a static check showing that a wave front is transformed as the algorithm prescribes for its inner loop.

A wave front is defined as a list of nodes at the same depth, i.e. the edit trace length, which is iterated upon by the dstep function to return nodes one step deeper. This function is a direct implementation of the extension procedure used by the algorithm at each search step. Furthermore, the paper proves a pair of lemmas that result in a specific configuration of the node list after each iteration, which is related to the diagonals on the edit grid and checked by a wfDiags predicate that I wrote to specify it. The details of this condition aren’t essential here: while the paper leverages it to introduce a space optimization, the Haskell implementation doesn’t depend on it, but the configuration is preserved nonetheless.

I encoded the fixed depth of nodes and their diagonal configuration using refinement type aliases to obtain a wave front specification.

-- | A node representing the tip of a path in the edit grid.
data DL = DL
    { ...
    , path :: [DI]   -- ^ The edit trace accumulated so far
    } deriving (Show, Eq)

-- A node at a fixed edit trace length (depth).
{-@ type DLN D = { x : DL | len (path x) = D } @-}

-- | This function is used only in LH specs to check if
-- diagonal configuration holds for a node list.
wfDiags :: [DL] -> Bool
wfDiags = ...

-- All nodes in a wave front are at the same depth,
-- and satisfy the diagonal configuration.
{-@ type WaveFront D = {xs : [DLN D] | wfDiags xs} @-}

With this encoding, and a phantom parameter carrying the current depth, I specified dstep (called from ses) to match the algorithm behavior (which also includes the node list growing by one).

{-@
dstep
  :: (Nat -> Nat -> Bool)
  -> d : Nat
  -> {nodes : WaveFront d | len nodes > 0}
  -> {v : WaveFront (d + 1) | len v = len nodes + 1}
@-}
dstep
  :: (Int -> Int -> Bool) -- ^ Check for node coordinates producing a free edge
  -> Int                  -- ^ The current depth; used for the static check of the wave front invariant
  -> [DL]                 -- ^ A non-empty wave front of nodes at edit distance D
  -> [DL]                 -- ^ A non-empty wave front of nodes at edit distance D+1

Refinement type aliases become statically checked invariants when used in a function specification, and are verified to hold at each call site.

As a second example, let’s see the invariants of a Hunk, expressed again using refinement type aliases.

-- A valid list diff is such that any `Both` value has arguments of equal length.
{-@ type ValidListDiff a b = { d : PolyDiff [a] [b] | validListDiff d }@-}

-- | True when, for a 'Both' value, both sides have the same length.
-- 'First' and 'Second' trivially satisfy this.
-- Introduced for LH specifications.
validListDiff :: PolyDiff [a] [b] -> Bool
validListDiff (Both xs ys) = length xs == length ys
validListDiff _ = True

-- | True if the list does not contain adjacent 'PolyDiff's with the same constructor.
noStuttering :: [PolyDiff a b] -> Bool
noStuttering = ...

-- | A 'Hunk' is a list of adjacent 'Diff's.
--
-- No two consecutive elements in a 'Hunk' are both applications
-- of 'First', 'Second', or 'Both', i.e. the list does not stutter
-- on 'Diff' constructors.
type Hunk c = [Diff [c]]

{-@ type Hunk c = { h : [ValidListDiff c c] | noStuttering h} @-}

The interesting part here is the check for the noStuttering invariant in the specification of the main Hunk producing function. For brevity’s sake I won’t show this function, but let’s see what came to be of the specification of the previously presented doPrefix, that is part of it, for the check to pass.

{-@ doPrefix ::  h : Hunk c
             -> {v : [ValidListDiff c c] | noFFSS v
                 && ... other auxiliary post-conditions ... } / [len h, 0] @-}

Essentially, this function traverses a given Hunk and chops and splits Both elements to a context size argument. After doing so the Hunk “stutters” on such elements, so it stops being a Hunk in the refined sense, even though the Haskell types match. Note the regular type synonym and the refinement type synonym don’t coalesce: At the Haskell level the synonym is just a renaming, but in the specification it is shadowed by the refinement synonym (thus enforcing its invariants). The noFFSS helper characterizes the resulting list; it is like noStuttering, but just for the other PolyDiff constructors: First and Second. Other auxiliary post-conditions (not shown) stating that input and output lists shared head constructors were also necessary for this check.

The verification of these and other invariants followed a similar outline:

Identify and document the invariant
Encode it in refinements
Write the specifications
Please the compiler

Pleasing the compiler after adding a new specification was trickier for me than the usual Haskell type error propagation and fix workflow. Figuring out exactly what LH is aware of when checking a specification requires an intuition of how it builds a context; then it’s a matter of making the missing information available.

For instance, the last step to get back a Hunk after the doPrefix-doSuffix operation required passing a lemma within a local dead binding for the specification to be verified.

{-@ assume lemmaReverseNoStuttering
      :: xs:_ -> { noStuttering (reverse xs) = noStuttering xs } @-}
lemmaReverseNoStuttering :: Hunk c -> ()
lemmaReverseNoStuttering _ = ()

-- | Split a 'Diff' list at consecutive 'Both'-'Both' boundaries.
{-@ splitBothBoth :: {ds:[ValidListDiff c c] | noFFSS ds} -> [Hunk c] @-}
splitBothBoth :: [Diff [c]] -> [Hunk c]
splitBothBoth = go []
  where
    {-@ go
          :: g:Hunk c
          -> {xs : [ValidListDiff c c] | noFFSS xs && not (headAlike g xs) }
          -> [Hunk c] / [len xs]
      @-}
    go :: Hunk c -> [Diff [c]] -> [Hunk c]
    go g (x@Both{} : y@Both{} : xs) = reverse (x:g) : go [] (y:xs)
      where
        lemma = lemmaReverseNoStuttering (x:g)
    go g (x : xs) = go (x:g) xs
    go g [] = [reverse g]
      where
        lemma = lemmaReverseNoStuttering g

This binding ultimately gets optimized away by GHC, but LH requires it to satisfy the static checks. LH didn’t have a means to know that reverse preserves the noStuttering of a PolyDiff list, so I provided it.

I decided to assume the lemma above on the rationale that its validity is straight-forward, while its proof would probably not be, making the disease not worth the medicine.

Lifting a dam

After this work I’m flooded with many thoughts and feelings about LH from the user perspective, but also ideas for important future developments.

One particular source of difficulty I found is differentiating between the existing means of lifting a Haskell function into the logic: reflect, inline, measure and define. By default, Haskell functions like wfDiag cannot be used in the refinement type predicates. They have to be accompanied by an annotation that indicates how to make them available in the predicates, which I omitted in my examples for the sake of argument.

Existing documentation does a good job at explaining their requirements and purpose. Nevertheless, subtle differences in constraint generation and logical expression unfolding aren’t documented, and these details matter when choosing between them in certain cases. Addressing this could lead to unifying or deprecating some functionality, but at least specifying them at a finer grain and adding some use case examples could go a long way.

A more salient difficulty are the error messages. They can be baffling, featuring not very human friendly variable names spread across enormous lists of bindings forming their “context”. Skimming through this context is a skill that I would love to deprecate. Looking first at the “inferred type” and the “required type” part at the start of the message is a useful technique, which can provide a lead to the source of the problem.

I find refinement types appealing because they are powerful yet non-sophisticated enough to be intuitive. Nevertheless, getting a function specification checked can become intricate, requiring additional proving machinery like function definitions exclusively intended for refinement predicates, lemmas in dead bindings to pass additional constraints or even heavy refactoring. However, I think the upfront cost of entry can be easily balanced out by using the escape hatches to focus the effort investment. In the Diff package, for instance, some low hanging fruit could be picked right away after disabling the checks on error triggering functions, e.g. refining integer values to naturals or enforcing clear-cut relations between record fields, adding immediate value without additional machinery.

A drawback is that polymorphism seems at odds with the simplicity of refinements: the more we want to specify about a value or function, the more we narrow its type. That said, there seems to be a correlation between code complexity and LH verification complexity that is worth investigating further: changes that simplified the verification of an invariant tended to benefit the code quality independently of it.

Choosing between fight or flight for a given invariant is ultimately about balancing safety gains with added complexity, and in my experience the code structure is what tips the scale: it determined both the refactorings I needed and the checks I had to forgo. My guess is that the whole equation changes when refinement types are a first class consideration during design.

Clearing up the waters

Hopefully this little epic amounts to a useful case study that, by showing what using LH is like today, encourages you to add static checks to an existing codebase or experiment in your next project with LH in your toolbox, and the techniques I’ve shared help prioritize the approach.

I discussed some of LH pain points to offer a balanced view and propose further DX improvements. There’s much to be done, but it’s steadily getting there. My opinion is that Liquid Haskell is a viable option today to add formal guarantees to an unsuspecting codebase at a reasonable cost, as long as the palette of shapes and extent this can take is kept in mind during design. Know that you’re welcome to contribute to LH development and that we’re ready to help strengthen your codebase. Just reach out!

Finally, I would like to express my gratitude to Aleksandr Vershilov, Arnaud Spiwack and Christopher Harrison for reviewing this text, and notably to Facundo Domínguez whose close collaboration was instrumental to streamline this work.

At the time of writing, the static checks are about to be proposed for upstream integration. But they can be found in the Liquid Haskell test suite as well.↩
A nice perk of working at Tweag is being supported to do open source contributions during or in-between client projects.↩
Found in https://github.com/seereason/Diff/pull/21 and https://github.com/seereason/Diff/pull/23 ↩
Found in https://github.com/seereason/Diff/pull/24, https://github.com/seereason/Diff/pull/25, https://github.com/seereason/Diff/pull/26 and https://github.com/seereason/Diff/pull/27.↩
The coordinates of a node in the edit grid represent the size of the prefix consumed from the first input and the size of the produced prefix of the other input, respectively. Thus, the endpoint has coordinates matching both input lengths. The grid’s most relevant feature is that, in addition to vertical and horizontal edges (corresponding to deletions and additions, respectively), there are “free” diagonal edges wherever both inputs have matching elements.↩

Behind the scenes

Xavier Góngora

Xavier is an interdisciplinary researcher, software engineer and musician working at the intersection of mathematics, functional programming, and music technology. He is fascinated by programming languages as a medium for harnessing abstraction, enabling both computational precision and creative expression.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.