We’ve all been there: wasting a couple of days on a silly bug. Good news for you: formal methods have never been easier to leverage.
In this post, I will discuss the contributions I made during my internship to Liquid Haskell (LH), a tool that makes proving that your Haskell code is correct a piece of cake.
LH lets you write contracts for your functions inside your Haskell code. In other words, you write pre-conditions (what must be true when you call it) and post-conditions (what must always be true when you leave the function). These are then fed into an SMT solver that proves your code satisfies them! You may have to write a few lemmas to guide LH, but it makes verification easier than proving them completely in a proof assistant.
My contributions enhance the reflection mechanism, which allows LH to unfold function definitions in logic formulas when verifying a program. I have explored three approaches that are described in what follows.
The problem
Imagine that, in the course of your work, you wanted to define a function that inserts into an association list.
{-@
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] |
lookup k l = Just v || head res = (k , v)
}
@-}
smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)]
smartInsert k v l
| lookup k l == Just v = l
| otherwise = (k, v) : l
LH runs as a compiler plugin. While the bulk of the compiler ignores the
special comments {-@ ... @-}
, LH processes the annotations therein.
The annotation that you see in the first snippet is the specification of smartInsert
,
with the post-condition establishing that the result of the function must have the
pair (k, v)
at the front, or the pair must be already present in the
original list.
Let us say that you also want to use that smartInsert
function later in the logic or
proofs, so you want to reflect it to the logic. For that, you will introduce another
annotation:
{-@ reflect smartInsert @-}
This annotation is telling LH that the equations of the Haskell definition of
smartInsert
can be used to unfold calls to smartInsert
in logic formulas.
As a human, you may agree that the specification is valid for this implementation, but you get this error from the machine:
error:
Illegal type specification for `Test.smartInsert`
[...]
Unbound symbol GHC.Internal.List.lookup --- perhaps you meant: GHC.Internal.Base.. ?
Do not despair! This tells you that lookup
is not defined in the logic.
Despite lookup being a respectable function in Haskell, defined in GHC.List
,
LH knows nothing about it. Not all
functions in Haskell can simply be used in the logic, at least not without
reflecting them first. Far from being discouraged, you decide to reflect it
like the others, but you realize that lookup
wasn’t defined in your own
module, it comes from the Prelude! This makes reflection impossible, as LH
points out:
error:
Cannot lift Haskell function `lookup` to logic
"lookup" is not in scope
If you consider for a moment, LH needs the definition of the function in order to reflect it. So it can only complain when it is asked to reflect a function whose definition is not available because it was defined in some library dependency.
This is a recurring problem, especially when working with dependencies, and this is exactly what I have been working on during this internship at Tweag, in three different ways, as described below.
Idea #1: Define our own reflection of the function
Your first thought might be: “if I cannot reflect lookup
because it comes
from a foreign library, I
will just define my own version of it myself”. Even better would be
if you could still link your custom definition of lookup
to the original symbol.
Creating this link was my first contribution.
Step one is to define the pretend function. For this to work out correctly in the end, its definition must be equivalent to the original definition of the imported function.
The definition of the pretend function might look like this:
myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((x, y):xys)
| key == x = Just y
| otherwise = myLookup key xys
So far, so good. Of course, we give it a different name from the actual function, as they refer to different definitions, and we want to be able to refer to both so that we can link them together later.
Now, we reflect this myLookup
function, which LH has no problem doing,
since this reflect
command is located in the same module as its definition.
{-@ reflect myLookup @-}
Then, the magic happens with this annotation that links the two lookups together:
{-@ assume reflect lookup as myLookup @-}
Read it as “reflect lookup
, assuming that its definition is the same as myLookup
”.
This is enough to get the smartInsert
function verified. Just for the record,
here is the working snippet:
{-@ reflect myLookup @-}
myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((x, y):xys)
| key == x = Just y
| otherwise = myLookup key xys
{-@ assume reflect lookup as myLookup @-}
{-@
reflect smartInsert
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] |
lookup k l = Just v || head res = (k , v)
}
@-}
smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)]
smartInsert k v l
| lookup k l == Just v = l
| otherwise = (k, v) : l
The question you may be asking at this point is: why does it work?
In order to verify the code, LH has to prove side-conditions (called subtyping
relations) between the actual output and the post-condition to be verified.
For the first equation of smartInsert
, it needs to be proved that
lookup k l = Just v && res = l
=>
lookup k l = Just v || head res = (k , v)
For the second equation, it needs to be proved that
res = (k, v) : l
=>
lookup k l = Just v || head res = (k , v)
Because we started with such a simple example,
the reflection of lookup
is actually unused here (even though LH conservatively insists on it).
But that’s just a coincidence; in fact, we can use a more direct
post-condition that does actually use the reflection:
{-@
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] | lookup k res = Just v}
@-}
This time, the subtyping constraints require proving:
-- constraint for the first equation
lookup k l = Just v && res = l
=>
lookup k res = Just v
-- constraint for the second equation
res = (k, v) : l
=>
lookup k res = Just v
The first constraint can still be solved without going into the definition of
lookup
. But the second constraint isn’t something that we can prove for any
definition of lookup. Thanks to reflection, we have the following unfoldings
at our disposal:
lookup key l = myLookup k l
myLookup key l =
if isEmpty l then Nothing
else if key = fst (head l) then
Just (snd (head l))
else
myLookup key (tail l)
The first equality is from assume-reflection.
It links the pretend and actual functions. The second one is the reflection
of myLookup
.
With that in mind, let’s move on to prove the second constraint. We reduce the left-hand side to the right-hand side.
lookup k res
= lookup k ((k, v):l) (hypothesis)
= myLookup k ((k, v) : l) (lookup unfolding)
= Just v (myLookup unfolding)
Q.E.D. Furthermore, you notice that the equation connecting lookup
and
myLookup
was crucial. That is the gist of what we added to LH to make the
proof work.
In addition to the implementation, I contributed a specification of assume-reflection that spells out the validation of the new annotation and the resolution rules when the same function is assume-reflected at different locations. It is worth noting that if there exist two assume-reflections in your imports that contradict each other, then one of them must be false, so your axiom environment will not be sound.
Idea #2: opaque reflection
We noted already that we didn’t truly need
to know what lookup
was about to prove the first, simpler specification,
namely:
{-@
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] |
lookup k res = Just v || head res = (k, v)
}
@-}
The only issue we had was that lookup
was not defined in the logic.
Similarly, it is possible that our own functions to be reflected use imported,
unreflected functions whose content is irrelevant. We want to reflect the
expressions of our functions, but do not care about the expression of some of
the functions that appear inside them. Here, we want to reflect smartInsert
,
which contains lookup
, but we don’t need to know exactly what lookup
is
about to prove our lemmas. Either lookup
comes from a dependency, or it has
a non-trivial implementation, or it uses primitives not implemented in Haskell.
We allowed this through what we call opaque reflection. Opaque reflection introduces a symbol, without any equation, for all the symbols in your reflections that aren’t defined yet in the logic.
For instance, when reflecting the definition of smartInsert
,
smartInsert k v l
| lookup k l == Just v = l
| otherwise = (k, v) : l
LH looks for any free symbols in there that are not present in the logic.
Here, it will see that lookup
is something new to the logic, and it will
introduce an uninterpreted function for it.
Uninterpreted functions are symbols used by the SMT solver, for which it only
knows it satisfies function congruence, i.e. that if two values are equal
v = w
, then when the function is applied to them, the result is still the
same f v = f w
.
As it turns out, we could also do that manually using the measure
annotation. These annotations let you introduce an uninterpreted function
in the logic yourself, and specify the refinement type of it.
For instance, we could define a measure like this:
{-@
measure GHC.Internal.List.lookup :: k:a -> xs:[(a, b)] -> Maybe b
GHC.Internal.List.lookup
:: k:a
-> xs:[(a, b)]
-> {VV : Maybe b | VV == GHC.Internal.List.lookup k xs}
@-}
The measure
annotation creates an uninterpreted function with the same name as the
function in the Haskell code. The second line links both the uninterpreted and
Haskell functions by strengthening the post-condition of the Haskell function
with the uninterpreted function from the logic.
The new opaque reflection does all that for you automatically! It’s even more powerful when you think about imports. If two modules are opaque-reflecting the same function from some common import, the uninterpreted symbols are considered the same because they refer to the same thing.
Whereas, if you were to use measure
annotations in both
imports for the same external functions (say, lookup
), and then to import those in
another module, LH would complain about it. Indeed, there can not be two measures
with identical names in scope. Since LH
doesn’t know what you’re using those measures for, or whether they actually
stand for the same uninterpreted function, it cannot resolve the ambiguity.
The full specification is here.
Idea #3: Using the unfoldings
At this point, someone might object that Haskell can inline even imported functions when optimizing the code, so it must have access to the original definitions. As such, there is no need for assume-reflection or opaque-reflection, if we could just reflect the function definition wherever the optimizer finds it.
It is indeed the case for some functions, and under some circumstances (note the precautions I’m taking here), that some information about the implementation of functions is passed in interface files.
What are interface files? These are the files that contain the information that
the other modules need to know. Part of this information is the unfoldings of
the exported functions, in a syntax that is slightly different from the GHC’s
CoreExpr
s, but can easily be converted to it.
After some experimentation, I observed that the unfoldings of many functions are
available in interface files, unless prevented by the -fignore-interface-pragmas or -fomit-interface-pragmas flags
(note that -O0
implies those flags, but -O1
does not).
Since most packages are compiled with at least -O1
, the unfolding of many functions are
available without any further tuning. In particular, those functions that are small
enough to be included in the interface files are available.
Once implemented, it suffices to use the same reflect
annotation as before,
but this time even for imported functions!
{-@ reflect flip -@}
LH will automatically detect if this function is defined in the current module or in the dependencies, and in the latter case it will look for possible unfoldings.
Unfortunately, these unfoldings turned out to have some drawbacks.
- The presence of these unfoldings depends on some GHC flags, and heuristics from GHC. As such, it’s possible for a new version of a library to suddenly exclude an unfolding without the library author realizing it. This predicament is akin to that of the HERMIT tool, and it is difficult to solve without rebuilding the dependencies with custom configuration.
- The unfoldings are based on the optimized version of the functions, which is sometimes harder to reason about. Also, it is subject to change if the GHC optimizations change, which means that any proof based on these unfoldings could be broken by a change to those optimizations.
- Many functions are not possible to reflect as they are. If they use local recursive definitions, or lambda abstractions, LH cannot reflect them at the moment.
- If the unfolding of a function depends on non-exported definitions, LH does not offer a mechanism to request these definitions to be reflected. Even if it did, this breaks encapsulation to some point, and makes our code dependent on internal implementation details of imported code, to the point where even a dot release could break the verification.
- Reflections are still limited in their capabilities. At the time of writing, reflected functions cannot contain lambda abstractions or local recursive bindings. Recursive bindings are allowed, but local ones are not, since LH has no sense of locality (yet). Because unfoldings tend to have a lot of these, we cannot reflect them (yet).
For these reasons, further work and experimentation will be needed to make this approach truly useful. Nevertheless, we have included the implementation in a PR in the hope that it may be helpful in some cases, and that improving the capabilities of reflections in general will make it more and more valuable.
Conclusion
Liquid Haskell’s reflection is handy and powerful, but if your function uses some dependencies that are not yet reflected, you were stuck. We presented three ways to proceed: assert an equivalence between the imported function and a definition in the current module (ideally copy-pasted from the original source file), introduce some uninterpreted function in the logic for dependencies, or try to find the unfoldings of those dependencies in interface files.
All of these features have been implemented and pulled into Liquid Haskell. The implementation fits well into LH’s machinery, reusing the existing pipeline for uninterpreted symbols and reflections. We also added tests, especially for module imports, and checked the implementation against the numerous regression tests already in place. An enticing next step would be to improve the capabilities of reflection, which would also allow diving deeper into the reflection of unfoldings in interface files.
I hope this will improve the ease of proof-writing in LH, and that reading this post will encourage you to write more specifications and proofs about your code, seeing how much of a breeze it can be!
I would like to thank Tweag for this wonderful opportunity to work on Liquid Haskell; it has been an enriching internship that has allowed me to grow in Haskell experience and in contributing to large codebases. In particular, I’d like to express my heartfelt thanks to my supervisor, Facundo Domínguez, for his constant support, guidance, and invaluable assistance.
About the author
If you enjoyed this article, you might be interested in joining the Tweag team.