In November 2022, Tweag engineers merged a WebAssembly back end into the Glasgow Haskell Compiler (GHC). The back end includes a new translation for control flow, which enables GHC to avoid depending on external tools like Binaryen. Because the translation is new, we wanted to test it before submitting a merge request. And classic unit testing was not a good fit—we would have needed to know what the WebAssembly code was expected to be generated from any given fragment of Haskell, and that’s a job for a compiler, not an engineer. Fortunately, we don’t care how generated code is written; we care about how it behaves.
Code’s behavior can be tested by running it, but we wanted to test the translation before integrating it with GHC. So we simulated translated code by tracing possible executions symbolically.
Translation from Cmm to WebAssembly
In GHC’s low-level intermediate form, Cmm, control flow is represented as an arbitrary graph. But in WebAssembly, control flow is represented as structured code:
In both representations, each letter A, B, C, or D stands for an
action that can change the state of the machine. And after the
action, the machine decides what action to perform next.
The decision made after action A
is conditional; it is based on an observation of a value that
depends on the state of the machine. That value is the value of an
expression: a condition in a conditional branch (Cmm) or a condition
associated with an if
statement (WebAssembly).
Cmm’s control flow is expressed through unconditional branches,
conditional branches, and Switch
. WebAssembly’s
control flow is expressed through multiple syntactic forms: blocks,
if
statements, loop
statements, return
, and some multipurpose
br
forms. A br
form can exit any block or if
in which it
appears, and it can branch to the start of any loop
in which it
appears. As an example, the loop
, br
, and return
forms are used in the
translation of a loop:
Translations from Cmm to WebAssembly, including the two shown here, can be tested while leaving the actual decisions and observations in abstract form (letters A through F, not actual code).
Reasoning about correctness
If a translation from a source function to a target function is correct, then when both functions are started in the same initial state, they take the same actions and they finish in the same final state. (In truth, the functions are started in related states and take related actions, but this post keeps things simple.) If an action Ai designates a function from states to states, then the final state after a sequence of actions operating on an initial state σ0 can be written as
σo▹A1▹A2▹⋯▹An
where ▹ is a reverse function-application symbol, as in Elm. Such states can be compared symbolically.
After an action Ai is taken, the subsequent action Ai+1 is
determined by a decision, which is made by control logic on the basis of the current machine state.
But the simulator’s states are so abstract that it doesn’t
how the control logic will make the decision—for
example, at a conditional branch, the simulator doesn’t know what the
value of the Boolean condition will be, so it doesn’t know which way
the decision will go. But it does know that a Boolean condition must observed to be True
or False
, so it simulates both alternatives—and it records
the observed condition.
A translation is deemed correct if for both source and target functions, the simulator records the same sequence of actions and observations. That condition suffices because
-
Every decision can be identified with the action that follows it.
-
If two functions start in the same state and they take the same action, they wind up in the same successor state.
-
If two functions observe the value of the same expression in the same state, they observe the same value.
In both Cmm and WebAssembly, control logic can observe just two forms of value: Boolean and
integer. A Boolean observation determines the decision made by a
conditional branch or if
statement. An integer observation
determines the decision made by a Switch
or a br_table
,
which are used to implement Haskell case
expressions.
Testing a single run
A translation is tested by simulating runs of both source code and target code. Each simulated run produces a list of events, where an event is an action or an observation:
data Event stmt expr = Action stmt
| Predicate expr Bool
| Switch expr (Integer,Integer) Integer
An Action
value records what action (stmt
) was simulated.
A Predicate
value records an observation of the Boolean value of the
given expression (expr
), from a
conditional branch or an if
statement. A Switch
value
records an observation of the given integer expression, e.g., in the
translation of a case
expression. A Switch
also records an
integer pair that specifies the range of possible observations (bounds
of a jump table).
The list of events is produced by interpreting Cmm code or WebAssembly code in a monad that supplies observations and remembers events:
class MonadFail m => ControlTestMonad stmt expr m where
takeAction :: stmt -> m ()
evalPredicate :: expr -> m Bool
evalEnum :: expr -> (Integer,Integer) -> m Integer
The takeAction
function simply adds an action event to the list of remembered events, but the
evalPredicate
and evalEnum
functions do more: each one not only
remembers an event but also returns an observation, which is drawn from
the monad’s supply.
The monad is used by an interpreter for Cmm functions:
evalGraph :: forall stmt expr m .
ControlTestMonad stmt expr m
=> (Label -> Block CmmNode O O -> stmt)
-> (Label -> CmmExpr -> expr)
-> CmmGraph
-> m ()
The first two arguments determine the representations of stmt
and
expr
, which are left up to client code. The third argument, of type
CmmGraph
, represents a Cmm function as a control-flow graph.
Function evalGraph
starts interpreter run
at the given graph’s entry label:
evalGraph stmt expr g = run (g_entry g)
where run :: Label -> m ()
run label = do
takeAction @stmt @expr (stmt label (actionOf label))
case lastNode (blockOf label) of
CmmBranch l -> run l
CmmCondBranch e t f _ -> do
b <- evalPredicate @stmt @expr (expr label e)
run (if b then t else f)
CmmSwitch e targets -> do
i <- evalEnum @stmt @expr (expr label e) $
extendRight $ switchTargetsRange targets
run $ labelIn i targets
CmmCall { cml_cont = Nothing } -> return ()
... more cases ...
Inside run
, auxiliary
function actionOf
returns the action associated with a labeled node in the control-flow graph, and lastNode
extracts the decision-making instruction from the end of the basic block that is stored at the
node. (Implementations of auxiliary functions
are not shown.)
Monad m
is instantiated as BitConsumer stmt expr
.
A bit consumer is run by supplying a sequence of Booleans:
eventsFromBits :: BitConsumer stmt expr () -> [Bool] -> [Event stmt expr]
The Booleans determine the observations supplied by the
monad. (This design makes it possible to test translations using
randomly generated bit strings, perhaps using QuickCheck.)
The BitConsumer
’s evalPredicate
method supplies a Boolean observation by
taking one Boolean from the sequence. The evalEnum
method supplies
an integer observation by taking as many Booleans as are needed to
code for an integer in the given range.
Target code is represented by a value of type WasmControl stmt expr
,
which is interpreted by
evalWasm
:
evalWasm :: ControlTestMonad stmt expr m => WasmControl stmt expr -> m ()
When evalGraph
is applied to a Cmm function and evalWasm
is applied to its translation, both bit consumers can be run
on the same sequence of bits.
If the resulting lists of events differ, there is a fault in the translator.
Identifying test inputs
To identify sequences of Booleans to pass to eventsFromBits
,
we consider what execution paths we wish to test.
Most functions contain more than one path, all of which
ought to be tested, but if a function contains loops, all
the paths can’t be tested: there are infinitely many. However! Our tests
compare symbolic executions.
And symbolic executions compose.
Therefore if a simulation of every loop and every loop-free path meets
the correctness criterion, the simulation of any possible path will
also meet the correctness criterion.
Paths are enumerated by a depth-first search of the Cmm
control-flow graph. At each node in the control-flow graph, the path
enumerator visits all possible successor nodes. And when a node has more
than one successor—that is, when the node makes a nontrivial decision—each successor is associated with an
observation. For example, a conditional node has two successors,
which are respectively associated with the observations True
and
False
. The enumerator also remembers the action taken at each node, so
it produces a list of paths, where each path is a sequence of
events:
type CmmPath = [Event Stmt Expr]
cmmPaths :: CmmGraph -> [CmmPath]
Function cmmPaths
returns every path through the control-flow graph
in which at most one node appears more than once. In other words, the
search stops every time it reaches a node that already appears on the
path so far, but it does include that node. This termination condition
ensures that every loop is among the paths that are enumerated, and so
is every non-looping path.
After the Cmm paths are enumerated, each one is converted to a sequence of
bits, which can then be fed to a BitConsumer
built from the WebAssembly interpreter.
A path is converted to bits in the EntropyTransducer
module, which also
exports a function that is used invert the integer conversion during
symbolic execution.
traceBits :: [Event a b] -> [Bool]
rangeSelect :: (Integer, Integer) -> [Bool] -> Maybe (Integer, [Bool])
Function rangeSelect
consumes only as many bits as are needed to
code for an integer in the given range. Any bits left over are
returned along with the integer. If there aren’t enough input bits to
code for an integer, rangeSelect
returns Nothing
.
Results
The infrastructure described above was used to test several components of GHC’s new translator:
- A simple translation from a reducible Cmm control-flow graph to a WebAssembly function
- An optimizing version of that translation
- A “node splitter” that converts an arbitrary control-flow graph to a reducible control-flow graph
- A peephole optimizer for WebAssembly
The first three components are now part of GHC.
Testing uncovered a handful of faults, the most notable of which were as follows:
-
A misunderstanding about the node-splitting algorithm resulted in an infinite loop. The fault was corrected by reimplementing the algorithm.
-
When generating code for a
case
expression that scrutinizes typeInt
, GHC specifies a jump table of size 264. Attempting to generate a WebAssembly table with that many entries made GHC run out of memory. The fault was corrected by callingcmmImplementSwitchPlans
. This function converts an insanely large jump table to a decision tree, each leaf of which holds a jump table of reasonable size. -
When generating
loop
forms, the translation to WebAssembly mistakenly put some code before the loop header; this code should have gone after the loop header. The fault was corrected by swapping two lines of code in the translator.
Testing also failed to uncover a notable fault: I misread the WebAssembly specification and thought that reaching the end
of a loop went back to the beginning of the loop. In fact, it exits. This fault was not uncovered until the translation was integrated into GHC. The fault was corrected by changing one line in
in the translator: I removed a function call that made “fallthrough context” of a loop equal to its entry point. To make the tests pass again, I made a corresponding change to
evalWasm
: on encountering the end of a loop, the corrected code no longer pushes the loop back onto its evaluation stack.
Conclusion
GHC is a big beast, and it’s hard to wrangle. We’re pleased that we were able to test the trickiest part of the WebAssembly code generator before fully integrating it with GHC and a live WebAssembly platform. And the tests remain present, so if someone forgets to enable the node-splitting algorithm in the production back end (cough, cough), we can quickly rule out the translation as the source of the fault.
About the author
Norman is visiting Tweag on sabbatical from Tufts University. He is a longtime teacher and researcher who works by building interesting software and then writing papers about it. His best-known work is on machine-level software: compilers, assemblers, linkers, and debuggers, including code generators for both Standard ML of New Jersey and the Glasgow Haskell Compiler. He is also the designer and developer of the Noweb tool for literate programming. In his day job, Norman has developed new university courses in programming languages, machine-level programming, programming-language implementation, and technical writing, as well as various special-topics courses. He is also the author of the forthcoming textbook *Programming Languages: Build, Prove, and Compare.*
If you enjoyed this article, you might be interested in joining the Tweag team.