Testing Control-Flow Translations in GHC

1 June 2023 — by Norman Ramsey

In November 2022, Tweag engineers merged a WebAssembly back end into the Glasgow Haskell Compiler (GHC). The back end includes a new translation for control flow, which enables GHC to avoid depending on external tools like Binaryen. Because the translation is new, we wanted to test it before submitting a merge request. And classic unit testing was not a good fit—we would have needed to know what the WebAssembly code was expected to be generated from any given fragment of Haskell, and that’s a job for a compiler, not an engineer. Fortunately, we don’t care how generated code is written; we care about how it behaves.

Code’s behavior can be tested by running it, but we wanted to test the translation before integrating it with GHC. So we simulated translated code by tracing possible executions symbolically.

Translation from Cmm to WebAssembly

In GHC’s low-level intermediate form, Cmm, control flow is represented as an arbitrary graph. But in WebAssembly, control flow is represented as structured code:

In both representations, each letter A, B, C, or D stands for an action that can change the state of the machine. And after the action, the machine decides what action to perform next. The decision made after action A is conditional; it is based on an observation of a value that depends on the state of the machine. That value is the value of an expression: a condition in a conditional branch (Cmm) or a condition associated with an if statement (WebAssembly).

Cmm’s control flow is expressed through unconditional branches, conditional branches, and Switch. WebAssembly’s control flow is expressed through multiple syntactic forms: blocks, if statements, loop statements, return, and some multipurpose br forms. A br form can exit any block or if in which it appears, and it can branch to the start of any loop in which it appears. As an example, the loop, br, and return forms are used in the translation of a loop:

Translations from Cmm to WebAssembly, including the two shown here, can be tested while leaving the actual decisions and observations in abstract form (letters A through F, not actual code).

Reasoning about correctness

If a translation from a source function to a target function is correct, then when both functions are started in the same initial state, they take the same actions and they finish in the same final state. (In truth, the functions are started in related states and take related actions, but this post keeps things simple.) If an action $A_i$ designates a function from states to states, then the final state after a sequence of actions operating on an initial state $\sigma_0$ can be written as

$\sigma_o \mathbin{\triangleright} A_1 \mathbin{\triangleright} A_2 \mathbin{\triangleright} \cdots \mathbin{\triangleright} A_n$

where $\mathbin{\triangleright}$ is a reverse function-application symbol, as in Elm. Such states can be compared symbolically.

After an action $A_i$ is taken, the subsequent action $A_{i+1}$ is determined by a decision, which is made by control logic on the basis of the current machine state. But the simulator’s states are so abstract that it doesn’t how the control logic will make the decision—for example, at a conditional branch, the simulator doesn’t know what the value of the Boolean condition will be, so it doesn’t know which way the decision will go. But it does know that a Boolean condition must observed to be True or False, so it simulates both alternatives—and it records the observed condition.

A translation is deemed correct if for both source and target functions, the simulator records the same sequence of actions and observations. That condition suffices because

Every decision can be identified with the action that follows it.
If two functions start in the same state and they take the same action, they wind up in the same successor state.
If two functions observe the value of the same expression in the same state, they observe the same value.

In both Cmm and WebAssembly, control logic can observe just two forms of value: Boolean and integer. A Boolean observation determines the decision made by a conditional branch or if statement. An integer observation determines the decision made by a Switch or a br_table, which are used to implement Haskell case expressions.

Testing a single run

A translation is tested by simulating runs of both source code and target code. Each simulated run produces a list of events, where an event is an action or an observation:

data Event stmt expr = Action stmt
                     | Predicate expr Bool
                     | Switch expr (Integer,Integer) Integer

An Action value records what action (stmt) was simulated. A Predicate value records an observation of the Boolean value of the given expression (expr), from a conditional branch or an if statement. A Switch value records an observation of the given integer expression, e.g., in the translation of a case expression. A Switch also records an integer pair that specifies the range of possible observations (bounds of a jump table).

The list of events is produced by interpreting Cmm code or WebAssembly code in a monad that supplies observations and remembers events:

class MonadFail m => ControlTestMonad stmt expr m where
  takeAction    :: stmt -> m ()
  evalPredicate :: expr -> m Bool
  evalEnum      :: expr -> (Integer,Integer) -> m Integer

The takeAction function simply adds an action event to the list of remembered events, but the evalPredicate and evalEnum functions do more: each one not only remembers an event but also returns an observation, which is drawn from the monad’s supply.

The monad is used by an interpreter for Cmm functions:

evalGraph :: forall stmt expr m .
             ControlTestMonad stmt expr m
            => (Label -> Block CmmNode O O -> stmt)
            -> (Label -> CmmExpr -> expr)
            -> CmmGraph
            -> m ()

The first two arguments determine the representations of stmt and expr, which are left up to client code. The third argument, of type CmmGraph, represents a Cmm function as a control-flow graph.

Function evalGraph starts interpreter run at the given graph’s entry label:

evalGraph stmt expr g = run (g_entry g)
  where run :: Label -> m ()
        run label = do
          takeAction @stmt @expr (stmt label (actionOf label))
          case lastNode (blockOf label) of
            CmmBranch l -> run l
            CmmCondBranch e t f _ -> do
                b <- evalPredicate @stmt @expr (expr label e)
                run (if b then t else f)
            CmmSwitch e targets -> do
                i <- evalEnum @stmt @expr (expr label e) $
                     extendRight $ switchTargetsRange targets
                run $ labelIn i targets
            CmmCall { cml_cont = Nothing } -> return ()
            ... more cases ...

Inside run, auxiliary function actionOf returns the action associated with a labeled node in the control-flow graph, and lastNode extracts the decision-making instruction from the end of the basic block that is stored at the node. (Implementations of auxiliary functions are not shown.)

Monad m is instantiated as BitConsumer stmt expr. A bit consumer is run by supplying a sequence of Booleans:

eventsFromBits :: BitConsumer stmt expr () -> [Bool] -> [Event stmt expr]

The Booleans determine the observations supplied by the monad. (This design makes it possible to test translations using randomly generated bit strings, perhaps using QuickCheck.) The BitConsumer’s evalPredicate method supplies a Boolean observation by taking one Boolean from the sequence. The evalEnum method supplies an integer observation by taking as many Booleans as are needed to code for an integer in the given range.

Target code is represented by a value of type WasmControl stmt expr, which is interpreted by evalWasm:

evalWasm :: ControlTestMonad stmt expr m => WasmControl stmt expr -> m ()

When evalGraph is applied to a Cmm function and evalWasm is applied to its translation, both bit consumers can be run on the same sequence of bits. If the resulting lists of events differ, there is a fault in the translator.

Identifying test inputs

To identify sequences of Booleans to pass to eventsFromBits, we consider what execution paths we wish to test. Most functions contain more than one path, all of which ought to be tested, but if a function contains loops, all the paths can’t be tested: there are infinitely many. However! Our tests compare symbolic executions. And symbolic executions compose. Therefore if a simulation of every loop and every loop-free path meets the correctness criterion, the simulation of any possible path will also meet the correctness criterion.

Paths are enumerated by a depth-first search of the Cmm control-flow graph. At each node in the control-flow graph, the path enumerator visits all possible successor nodes. And when a node has more than one successor—that is, when the node makes a nontrivial decision—each successor is associated with an observation. For example, a conditional node has two successors, which are respectively associated with the observations True and False. The enumerator also remembers the action taken at each node, so it produces a list of paths, where each path is a sequence of events:

type CmmPath = [Event Stmt Expr]
cmmPaths :: CmmGraph -> [CmmPath]

Function cmmPaths returns every path through the control-flow graph in which at most one node appears more than once. In other words, the search stops every time it reaches a node that already appears on the path so far, but it does include that node. This termination condition ensures that every loop is among the paths that are enumerated, and so is every non-looping path.

After the Cmm paths are enumerated, each one is converted to a sequence of bits, which can then be fed to a BitConsumer built from the WebAssembly interpreter. A path is converted to bits in the EntropyTransducer module, which also exports a function that is used invert the integer conversion during symbolic execution.

traceBits :: [Event a b] -> [Bool]
rangeSelect :: (Integer, Integer) -> [Bool] -> Maybe (Integer, [Bool])

Function rangeSelect consumes only as many bits as are needed to code for an integer in the given range. Any bits left over are returned along with the integer. If there aren’t enough input bits to code for an integer, rangeSelect returns Nothing.

Results

The infrastructure described above was used to test several components of GHC’s new translator:

A simple translation from a reducible Cmm control-flow graph to a WebAssembly function
An optimizing version of that translation
A “node splitter” that converts an arbitrary control-flow graph to a reducible control-flow graph
A peephole optimizer for WebAssembly

The first three components are now part of GHC.

Testing uncovered a handful of faults, the most notable of which were as follows:

A misunderstanding about the node-splitting algorithm resulted in an infinite loop. The fault was corrected by reimplementing the algorithm.
When generating code for a case expression that scrutinizes type Int, GHC specifies a jump table of size $2^{64}$ . Attempting to generate a WebAssembly table with that many entries made GHC run out of memory. The fault was corrected by calling cmmImplementSwitchPlans. This function converts an insanely large jump table to a decision tree, each leaf of which holds a jump table of reasonable size.
When generating loop forms, the translation to WebAssembly mistakenly put some code before the loop header; this code should have gone after the loop header. The fault was corrected by swapping two lines of code in the translator.

Testing also failed to uncover a notable fault: I misread the WebAssembly specification and thought that reaching the end of a loop went back to the beginning of the loop. In fact, it exits. This fault was not uncovered until the translation was integrated into GHC. The fault was corrected by changing one line in in the translator: I removed a function call that made “fallthrough context” of a loop equal to its entry point. To make the tests pass again, I made a corresponding change to evalWasm: on encountering the end of a loop, the corrected code no longer pushes the loop back onto its evaluation stack.

Conclusion

GHC is a big beast, and it’s hard to wrangle. We’re pleased that we were able to test the trickiest part of the WebAssembly code generator before fully integrating it with GHC and a live WebAssembly platform. And the tests remain present, so if someone forgets to enable the node-splitting algorithm in the production back end (cough, cough), we can quickly rule out the translation as the source of the fault.

Behind the scenes

Norman Ramsey

Norman is visiting Tweag on sabbatical from Tufts University. He is a longtime teacher and researcher who works by building interesting software and then writing papers about it. His best-known work is on machine-level software: compilers, assemblers, linkers, and debuggers, including code generators for both Standard ML of New Jersey and the Glasgow Haskell Compiler. He is also the designer and developer of the Noweb tool for literate programming. In his day job, Norman has developed new university courses in programming languages, machine-level programming, programming-language implementation, and technical writing, as well as various special-topics courses. He is also the author of the forthcoming textbook *Programming Languages: Build, Prove, and Compare.*

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.