For a while now, Tweag has committed in improving various aspects of Liquid Haskell (LH), a tool that gives the Haskell programmer both the ability to express properties about programs and to verify that they meet these expectations.
In this post we present a specific improvement that I integrated recently, which cuts down the maintenance cost to use LH when introducing assumptions about functions coming from large or multiple packages.
Problem: How to reason about dependencies
Let us suppose that we are trying to verify that the following function yields a non-empty list.
module M where
{-@ consPlus1 :: Int -> [Int] -> { v:[Int] | len v > 0 } @-}
consPlus1 :: Int -> [Int] -> [Int]
consPlus1 x xs = map (+1) (x:xs)
The definition starts with a comment dedicated to LH, delimited by {-@ ... @-}
.
It contains a type signature for consPlus1
with refinement types.
A refinement type denotes a subtype of another type,
characterized by a given predicate, like len v > 0
.
In this case the predicate of the result uses a function len
that
yields the length of a list in the logic. This predicate
states that consPlus1
yields a list with at least one element.
This function depends on the map
function from the base
library.
LH can infer that consPlus1
yields a non-empty list, if it knows that map
yields a non-empty list. Proving that map
yields a
non-empty list requires inspecting its definition. However, the
definition of map
is not in the current package, it comes from the base
dependency.
One option to reach a complete proof would be to add LH annotations to the
base
package, proving there that map
yields non-empty lists
when fed non-empty lists. This would require either convincing the
maintainers of base
to add these annotations, or creating a copy
of base
with the LH annotations and proofs that we need. Both
options are problematic.
If the maintainers of base
accept our LH annotations, they will be
expected to keep them up to date, as base
and LH evolve.
Moreover, they will need to figure out the most relevant set of
annotations and proofs worth maintaining among user inquiries.
If we copy the base
package and our project starts depending on
a modified dependency, we will be responsible for updating our copy
when the original base
package is modified. Moreover, if other
projects start using their own copy of base
, then we will end up
with a bunch of copies with different annotations each, which then
makes composing the projects a challenge.
The rest of the post is devoted to explain what LH did initially to improve upon this situation, and how I further improved it later on.
Idea: Separating annotations and definitions
Proving that map
yields some property of interest does often
require its definition. But if we are satisfied with just
assuming that the property holds without an actual proof, then
we could tell so to LH at the place where we need to use such
an assumption, without any need to modify the dependency itself.
Here’s how we could write our assumption about map
.
{-@ assume map :: (a -> b) -> xs:[a] -> { v:[a] | len xs = len v } @-}
The assumption starts with the assume
keyword, indicating that
the predicate in the return type is not to be proved. Then
follows an assumed type signature for map
with refinement types.
The parameters of the function can have a name, like xs
, that
can be used in the predicates of later arguments and in the
result. The predicate of the return type states that the input and
the output have the same length.
This works fine if we need to use the assumption only once, but if the assumption is needed at multiple places, it is inconvenient to copy the assumption at each usage site.
Could we put the assumption at some place where LH could find it when needed? This is a problem similar to the one in Python, where authors might want to add type signatures to the interfaces of preexisting libraries in dependencies. The first answer to this question was known as wrapper packages and they were introduced at the time LH became a GHC plugin in 2020.
Old solution: Wrapper packages
A wrapper package is a package that reexports all the definitions
from an existing package, and adds assumptions
about them. In the case of base
, we would
have a wrapper package liquid-base
that contained modules using
the same names of modules in base
. A minimal wrapper package for our running example would be this:
{-# LANGUAGE QualifiedImports #-}
module GHC.Base(module X) where
import "base" GHC.Base as X
{-@ assume map :: (a -> b) -> { xs:[a] | len xs > 0 } -> { v:[a] | len v > 0 } @-}
The module liquid-base:GHC.Base
is a drop-in replacement for
base:GHC.Base
that can introduce assumptions about functions in
GHC.Base
. More generally, the liquid-base
package is a
drop-in replacement for the base
package, because it would
define similar reexporting modules for every module in base
.
When we are verifying a program that depends on base
, we remove base
from
the dependencies and use liquid-base
instead. What’s good about this approach is
that we do not need to change the import declarations in our program
to get the new assumptions, and the relevant assumptions are in
scope at the call site of every function. Thus our dependencies in
a hypothetical file our_package.cabal
would be:
...
build-depends:
liquid-base
A disadvantage of this approach, however, is that the
programmer must manually track that the right version
of liquid-base
is used to replace base
, and so on for every
dependency on which assumptions need to be introduced.
Another disadvantage is that liquid-base
must really reexport
definitions from all modules in base
even if we only need to
add a single assumption for map
. Otherwise, liquid-base
could not be a replacement for base
. This imposes a maintenance
overhead that is proportional to the number of dependencies and their sizes.
This overhead needs to be paid when first creating a wrapper
package and then again when updating it to accommodate changes
in the wrapped package.
Lastly, the wrapper packages pollute and obfuscate the dependencies of
a project. A package that depends on base
doesn’t show it
explicitly since it depends instead on liquid-base
. The reader
has to know somehow or trust that liquid-base
really exports
the same Haskell definitions as base
.
In order to address the above problems, I proposed and implemented an alternative mechanism to share assumptions.
New solution: Packages with assumptions
Ideally, LH would be able to find assumptions without defining wrapper packages and without changing the original source code.
Since LH already allowed to put assumptions anywhere, the remaining problem was how to make LH find these assumptions. The user could straightforwardly add import declarations of the modules containing the assumptions, but it would be preferable to avoid modifying the original source code as much as possible. The new mechanism relies on a naming convention for modules that hold assumptions.
If the module we are verifying imports GHC.Base
, then with the
new mechanism LH will
look for assumptions in a module named GHC.Base_LHAssumptions
,
if it exists.
module GHC.Base_LHAssumptions where
import GHC.Base
{-@ assume GHC.Base.map :: (a -> b) -> { xs:[a] | len xs > 0 } -> { v:[a] | len v > 0 } @-}
And so on
for the rest of the imports: if module X
is imported, then extra
assumptions will be read from modules named X_LHAssumptions
if they
exists in any dependency of the current package.
We can thus provide a package liquid-base-assumptions
, which essentially
contains modules with assumptions. E.g.
GHC.Base_LHAssumptions
GHC.List_LHAssumptions
Prelude_LHAssumptions
...
Now, when we want to verify a package that depends on base
, we add a dependency on liquid-base-assumptions
to allow LH to find the _LHAssumptions
modules that it contains.
Dependencies could look as follows in our_package.cabal
:
...
build-depends:
base,
liquid-base-assumptions
If a module doesn’t export a function that needs assumptions, there
is no need to define a module _LHAssumptions
, and certainly no
need to define modules that just reexport definitions from elsewhere,
which is precisely the overhead we wanted to avoid.
An additional simplification is that we can save the user the
trouble of picking a version of the assumptions that matches the
base
version in use. This is because the version of base
is mostly dictated by the version of GHC that the project uses,
and LH knows which GHC version is in use when it is built.
More generally, LH can guess the assumptions to use for every
package whose version is tied to the compiler (i.e. it is a boot
package). Because of this, the _LHAssumptions
modules
corresponding to boot packages now come included in the
liquidhaskell
package. Then the dependencies in
our_package.cabal
become again
...
build-depends:
base
Final remarks
The mechanism based on the naming convention was integrated recently in LH and can be used from the latest version in the github repository. Because dozens of reexporting modules were eliminated, the time needed to build and test LH has nearly halved.
The implementation passes the test suite of LH, and now it is time
to experiment a bit with larger examples.
For instance, it is plausible that situations will arise where
assumptions need to be modified, and provisions will need to be
taken to inhibit the implicit loading of _LHAssumptions
modules or
override some of the imported assumptions. The solution in Python,
which is similar to the one in LH, does anticipate cases like these.
Special thanks to Niki Vazou and Ranjit Jhala for helping me navigate the sometimes tricky design space to arrive at the current solution. Suggestions to improve the management of assumptions are welcome, so please, don’t hesitate to reach out and share your thoughts.
About the author
Facundo is a software engineer supporting development and research projects at Tweag. Prior to joining Tweag, he worked in academia and in industry, on a varied assortment of domains, with an overarching interest in programming languages.
If you enjoyed this article, you might be interested in joining the Tweag team.