Reflecting away from definitions in Liquid Haskell

12 September 2024 — by Jonathan Arnoult

We’ve all been there: wasting a couple of days on a silly bug. Good news for you: formal methods have never been easier to leverage.

In this post, I will discuss the contributions I made during my internship to Liquid Haskell (LH), a tool that makes proving that your Haskell code is correct a piece of cake.

LH lets you write contracts for your functions inside your Haskell code. In other words, you write pre-conditions (what must be true when you call it) and post-conditions (what must always be true when you leave the function). These are then fed into an SMT solver that proves your code satisfies them! You may have to write a few lemmas to guide LH, but it makes verification easier than proving them completely in a proof assistant.

My contributions enhance the reflection mechanism, which allows LH to unfold function definitions in logic formulas when verifying a program. I have explored three approaches that are described in what follows.

The problem

Imagine that, in the course of your work, you wanted to define a function that inserts into an association list.

{-@
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] |
        lookup k l = Just v || head res = (k , v)
     }
@-}
smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)]
smartInsert k v l
  | lookup k l == Just v = l
  | otherwise = (k, v) : l

LH runs as a compiler plugin. While the bulk of the compiler ignores the special comments {-@ ... @-}, LH processes the annotations therein.

The annotation that you see in the first snippet is the specification of smartInsert, with the post-condition establishing that the result of the function must have the pair (k, v) at the front, or the pair must be already present in the original list.

Let us say that you also want to use that smartInsert function later in the logic or proofs, so you want to reflect it to the logic. For that, you will introduce another annotation:

{-@ reflect smartInsert @-}

This annotation is telling LH that the equations of the Haskell definition of smartInsert can be used to unfold calls to smartInsert in logic formulas.

As a human, you may agree that the specification is valid for this implementation, but you get this error from the machine:

error:
Illegal type specification for `Test.smartInsert`
[...]
    Unbound symbol GHC.Internal.List.lookup --- perhaps you meant: GHC.Internal.Base.. ?

Do not despair! This tells you that lookup is not defined in the logic. Despite lookup being a respectable function in Haskell, defined in GHC.List, LH knows nothing about it. Not all functions in Haskell can simply be used in the logic, at least not without reflecting them first. Far from being discouraged, you decide to reflect it like the others, but you realize that lookup wasn’t defined in your own module, it comes from the Prelude! This makes reflection impossible, as LH points out:

error:
Cannot lift Haskell function `lookup` to logic
"lookup" is not in scope

If you consider for a moment, LH needs the definition of the function in order to reflect it. So it can only complain when it is asked to reflect a function whose definition is not available because it was defined in some library dependency.

This is a recurring problem, especially when working with dependencies, and this is exactly what I have been working on during this internship at Tweag, in three different ways, as described below.

Idea #1: Define our own reflection of the function

Your first thought might be: “if I cannot reflect lookup because it comes from a foreign library, I will just define my own version of it myself”. Even better would be if you could still link your custom definition of lookup to the original symbol. Creating this link was my first contribution.

Step one is to define the pretend function. For this to work out correctly in the end, its definition must be equivalent to the original definition of the imported function.

The definition of the pretend function might look like this:

myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((x, y):xys)
  | key == x  = Just y
  | otherwise = myLookup key xys

So far, so good. Of course, we give it a different name from the actual function, as they refer to different definitions, and we want to be able to refer to both so that we can link them together later.

Now, we reflect this myLookup function, which LH has no problem doing, since this reflect command is located in the same module as its definition.

{-@ reflect myLookup @-}

Then, the magic happens with this annotation that links the two lookups together:

{-@ assume reflect lookup as myLookup @-}

Read it as “reflect lookup, assuming that its definition is the same as myLookup”. This is enough to get the smartInsert function verified. Just for the record, here is the working snippet:

{-@ reflect myLookup @-}
myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((x, y):xys)
  | key == x  = Just y
  | otherwise = myLookup key xys

{-@ assume reflect lookup as myLookup @-}

{-@
reflect smartInsert
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] |
       lookup k l = Just v || head res = (k , v)
     }
@-}
smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)]
smartInsert k v l
  | lookup k l == Just v = l
  | otherwise = (k, v) : l

The question you may be asking at this point is: why does it work?

In order to verify the code, LH has to prove side-conditions (called subtyping relations) between the actual output and the post-condition to be verified. For the first equation of smartInsert, it needs to be proved that

lookup k l = Just v && res = l
  =>
lookup k l = Just v || head res = (k , v)

For the second equation, it needs to be proved that

res = (k, v) : l
  =>
lookup k l = Just v || head res = (k , v)

Because we started with such a simple example, the reflection of lookup is actually unused here (even though LH conservatively insists on it). But that’s just a coincidence; in fact, we can use a more direct post-condition that does actually use the reflection:

{-@
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] | lookup k res = Just v}
@-}

This time, the subtyping constraints require proving:

-- constraint for the first equation
lookup k l = Just v && res = l
  =>
lookup k res = Just v

-- constraint for the second equation
res = (k, v) : l
  =>
lookup k res = Just v

The first constraint can still be solved without going into the definition of lookup. But the second constraint isn’t something that we can prove for any definition of lookup. Thanks to reflection, we have the following unfoldings at our disposal:

lookup key l = myLookup k l

myLookup key l =
  if isEmpty l then Nothing
  else if key = fst (head l) then
    Just (snd (head l))
  else
    myLookup key (tail l)

The first equality is from assume-reflection. It links the pretend and actual functions. The second one is the reflection of myLookup.

With that in mind, let’s move on to prove the second constraint. We reduce the left-hand side to the right-hand side.

lookup k res
    = lookup k ((k, v):l)       (hypothesis)
    = myLookup k ((k, v) : l)   (lookup unfolding)
    = Just v                    (myLookup unfolding)

Q.E.D. Furthermore, you notice that the equation connecting lookup and myLookup was crucial. That is the gist of what we added to LH to make the proof work.

In addition to the implementation, I contributed a specification of assume-reflection that spells out the validation of the new annotation and the resolution rules when the same function is assume-reflected at different locations. It is worth noting that if there exist two assume-reflections in your imports that contradict each other, then one of them must be false, so your axiom environment will not be sound.

Idea #2: opaque reflection

We noted already that we didn’t truly need to know what lookup was about to prove the first, simpler specification, namely:

{-@
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] |
       lookup k res = Just v || head res = (k, v)
     }
@-}

The only issue we had was that lookup was not defined in the logic. Similarly, it is possible that our own functions to be reflected use imported, unreflected functions whose content is irrelevant. We want to reflect the expressions of our functions, but do not care about the expression of some of the functions that appear inside them. Here, we want to reflect smartInsert, which contains lookup, but we don’t need to know exactly what lookup is about to prove our lemmas. Either lookup comes from a dependency, or it has a non-trivial implementation, or it uses primitives not implemented in Haskell.

We allowed this through what we call opaque reflection. Opaque reflection introduces a symbol, without any equation, for all the symbols in your reflections that aren’t defined yet in the logic.

For instance, when reflecting the definition of smartInsert,

smartInsert k v l
  | lookup k l == Just v = l
  | otherwise = (k, v) : l

LH looks for any free symbols in there that are not present in the logic. Here, it will see that lookup is something new to the logic, and it will introduce an uninterpreted function for it. Uninterpreted functions are symbols used by the SMT solver, for which it only knows it satisfies function congruence, i.e. that if two values are equal v = w, then when the function is applied to them, the result is still the same f v = f w.

As it turns out, we could also do that manually using the measure annotation. These annotations let you introduce an uninterpreted function in the logic yourself, and specify the refinement type of it.

For instance, we could define a measure like this:

{-@
measure GHC.Internal.List.lookup :: k:a -> xs:[(a, b)] -> Maybe b
GHC.Internal.List.lookup
  :: k:a
  -> xs:[(a, b)]
  -> {VV : Maybe b | VV == GHC.Internal.List.lookup k xs}
@-}

The measure annotation creates an uninterpreted function with the same name as the function in the Haskell code. The second line links both the uninterpreted and Haskell functions by strengthening the post-condition of the Haskell function with the uninterpreted function from the logic.

The new opaque reflection does all that for you automatically! It’s even more powerful when you think about imports. If two modules are opaque-reflecting the same function from some common import, the uninterpreted symbols are considered the same because they refer to the same thing.

Whereas, if you were to use measure annotations in both imports for the same external functions (say, lookup), and then to import those in another module, LH would complain about it. Indeed, there can not be two measures with identical names in scope. Since LH doesn’t know what you’re using those measures for, or whether they actually stand for the same uninterpreted function, it cannot resolve the ambiguity. The full specification is here.

Idea #3: Using the unfoldings

At this point, someone might object that Haskell can inline even imported functions when optimizing the code, so it must have access to the original definitions. As such, there is no need for assume-reflection or opaque-reflection, if we could just reflect the function definition wherever the optimizer finds it.

It is indeed the case for some functions, and under some circumstances (note the precautions I’m taking here), that some information about the implementation of functions is passed in interface files.

What are interface files? These are the files that contain the information that the other modules need to know. Part of this information is the unfoldings of the exported functions, in a syntax that is slightly different from the GHC’s CoreExprs, but can easily be converted to it.

After some experimentation, I observed that the unfoldings of many functions are available in interface files, unless prevented by the -fignore-interface-pragmas or -fomit-interface-pragmas flags (note that -O0 implies those flags, but -O1 does not). Since most packages are compiled with at least -O1, the unfolding of many functions are available without any further tuning. In particular, those functions that are small enough to be included in the interface files are available.

Once implemented, it suffices to use the same reflect annotation as before, but this time even for imported functions!

{-@ reflect flip -@}

LH will automatically detect if this function is defined in the current module or in the dependencies, and in the latter case it will look for possible unfoldings.

Unfortunately, these unfoldings turned out to have some drawbacks.

The presence of these unfoldings depends on some GHC flags, and heuristics from GHC. As such, it’s possible for a new version of a library to suddenly exclude an unfolding without the library author realizing it. This predicament is akin to that of the HERMIT tool, and it is difficult to solve without rebuilding the dependencies with custom configuration.
The unfoldings are based on the optimized version of the functions, which is sometimes harder to reason about. Also, it is subject to change if the GHC optimizations change, which means that any proof based on these unfoldings could be broken by a change to those optimizations.
Many functions are not possible to reflect as they are. If they use local recursive definitions, or lambda abstractions, LH cannot reflect them at the moment.
If the unfolding of a function depends on non-exported definitions, LH does not offer a mechanism to request these definitions to be reflected. Even if it did, this breaks encapsulation to some point, and makes our code dependent on internal implementation details of imported code, to the point where even a dot release could break the verification.
Reflections are still limited in their capabilities. At the time of writing, reflected functions cannot contain lambda abstractions or local recursive bindings. Recursive bindings are allowed, but local ones are not, since LH has no sense of locality (yet). Because unfoldings tend to have a lot of these, we cannot reflect them (yet).

For these reasons, further work and experimentation will be needed to make this approach truly useful. Nevertheless, we have included the implementation in a PR in the hope that it may be helpful in some cases, and that improving the capabilities of reflections in general will make it more and more valuable.

Conclusion

Liquid Haskell’s reflection is handy and powerful, but if your function uses some dependencies that are not yet reflected, you were stuck. We presented three ways to proceed: assert an equivalence between the imported function and a definition in the current module (ideally copy-pasted from the original source file), introduce some uninterpreted function in the logic for dependencies, or try to find the unfoldings of those dependencies in interface files.

All of these features have been implemented and pulled into Liquid Haskell. The implementation fits well into LH’s machinery, reusing the existing pipeline for uninterpreted symbols and reflections. We also added tests, especially for module imports, and checked the implementation against the numerous regression tests already in place. An enticing next step would be to improve the capabilities of reflection, which would also allow diving deeper into the reflection of unfoldings in interface files.

I hope this will improve the ease of proof-writing in LH, and that reading this post will encourage you to write more specifications and proofs about your code, seeing how much of a breeze it can be!

I would like to thank Tweag for this wonderful opportunity to work on Liquid Haskell; it has been an enriching internship that has allowed me to grow in Haskell experience and in contributing to large codebases. In particular, I’d like to express my heartfelt thanks to my supervisor, Facundo Domínguez, for his constant support, guidance, and invaluable assistance.

Behind the scenes

Jonathan Arnoult

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Adding algebraic data types to Nickel Python Packaging in the Real World: Biomedical projects vs. PyPI →