Coverage-guided fuzzing of Haskell programs for cheap

15 June 2023 — by Cheng Shao

If you’re a Haskell developer, it’s likely you’re already familiar with the concept of property-based testing, and have first hand experience with a framework like QuickCheck or hedgehog. You might also have heard of the term “fuzzing” in various places, and it sounds just like what we’ve already been doing for so long, right? Just generate a bunch of random test inputs, in the hope of exposing an unknown bug?

Well, actually there’s more to it than that! Recently, I had some fun exploring coverage-guided fuzzers like AFL++ and libFuzzer. Along the way, I discovered a simple trick that allows us to compile Haskell code in a manner that these fuzzers can handle. The experience was akin to unlocking a hidden skill. I can say, without a doubt, that coverage-guided fuzzing can work wonders. This blog post shall guide you through fuzzing a first example…in Haskell!

Special thanks to Jade Lovelace for her discussion on the fediverse with me, without whom this blog post would probably not have seen the light of day.

What the actual fuzz?

At a high level, the job of fuzzers might seem quite similar to that of QuickCheck: they both generate a multitude of random test inputs for your code. There are subtle differences though:

QuickCheck generates test inputs as Haskell values directly to test a Haskell function. The interfaces of these fuzzers are more generic and low-level, generating only blobs of data that are passed either through stdin to your program or as arguments to your functions.
QuickCheck uses a Haskell function to express a property. Consequently, any “interesting” behavior worth reporting is a violation of these user-specified properties at runtime. However, for fuzzers like those mentioned above, an “interesting” behavior is synonymous with the program “crashing”.

The differences above may not matter much. The code being fuzzed can deserialize the input blob to get a structured value, and it can crash if and only if a user-specified property is violated.

You might be wondering, if we already have QuickCheck, what’s the point of this blog post? Now, consider the following Haskell function (borrowed from its OCaml sibling):

fuzzMe :: String -> IO ()
fuzzMe ('s':'e':'c':'r':'e':'t':' ':'c':'o':'d':'e':[]) = fail "uh oh"
fuzzMe _ = pure ()

While the input space is huge, only one specific input string would trigger the bug. In this case, QuickCheck wouldn’t help much, the chance that "secret code" occurs among the generated cases is close to zero.

Coverage-guided fuzzing

Property-based testing frameworks like QuickCheck are actually examples of “black-box” fuzzing. The fuzzer knows nothing about the program being tested, and only observes program behavior by their outputs. The pros and cons of black-box fuzzing are clear: it’s easy to implement, easy to plug into user code, and proven useful in the field. However, as the fuzzMe example above demonstrates, finding interesting test inputs via black-box fuzzing is sometimes nearly impossible.

What if the program is no longer a black box to the fuzzer? In addition to observing what output is generated for a particular input, the fuzzer also observes how the program produces that output. For instance, it might observe a complete control-flow log that shows what functions are called and in what order. This is “grey-box” fuzzing as implemented by fuzzers like AFL++ and libFuzzer.

Now, there’s a tradeoff here. If the program being fuzzed spends more time bookkeeping its execution state more accurately, fewer test cases get run per second. In practice, grey-box fuzzers perform coverage-guided fuzzing:

At compile-time, the compiler performs lightweight instrumentation on user code. Unique counters are assigned to locations like function entry/exit points and control-flow graph edges. The compiler inserts code that increments the counter whenever that path is taken at runtime.
Normally, the coverage-related counters are used to generate a detailed HTML report that shows how much of your code is actually covered by your test suite. However, the fuzzer can also use them as an inexpensive approximation of the complete control-flow log.
The fuzzer generates new cases by mutating existing ones. As long as a new case makes the program take a new execution path, the fuzzer will notice it and use it as a basis for generating future cases.

Can coverage-guided fuzzing crack the fuzzMe nut? Conceptually yes. Among the randomly generated test cases, it’s unlikely that "secret code" will appear at first, but there will be cases that begin with 's'. Control-flow is vastly different between non-'s' cases and the 's' cases due to pattern matching in fuzzMe, the fuzzer would notice this difference and try exploring the 's' direction, thus leading to the discovery of 'e' and so on.

As explained in the previous section, a coverage-guided fuzzer requires compile-time instrumentation to be performed on user code. With appropriate command-line flags, the clang compiler does this when compiling C/C++ to LLVM IR, and it’s the basis for fuzzers like AFL++ and libFuzzer.

By default, GHC compiles Haskell to Cmm, then to assembly or to LLVM IR before generating the object file. However, clang can’t instrument these low-level IRs yet. While it’s totally possible for GHC to perform instrumentation in its own backends, no one has implemented this functionality yet.

Fortunately, GHC can be built with an “unregisterised” mode. Under this mode, GHC compiles Cmm to C, then to an object file. This is how we can get the instrumentation we want for cheap!

The rest of this blog post will focus on libFuzzer, which has decent performance under its default settings and is also easier to get started with, since it is a part of the LLVM project and ships with LLVM/Clang releases.

Now, it’s time to kickstart a GHC build and take your coffee break:

$ git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
$ cd ghc
$ ./boot
$ ./configure --enable-unregisterised --with-system-libffi CC=/usr/lib/llvm-16/bin/clang CXX=/usr/lib/llvm-16/bin/clang++ LD=/usr/lib/llvm-16/bin/ld.lld CONF_GCC_LINKER_OPTS_STAGE2="--ld-path=/usr/lib/llvm-16/bin/ld.lld"
$ hadrian/build --flavour=perf+no_profiled_libs --docs=none -j

The options to ./configure make GHC produce C code then use the clang toolchain to compile and link. The locations of the executables and libraries is for clang 16 at its usual location; adapt for your version of clang.

If you haven’t built GHC before, don’t worry, the hadrian documentation is a good starting point.

Fuzzing your first Haskell function

The commands above merely build an unregisterised GHC without any instrumentation. That’s right, it is possible and even desirable to only instrument interesting parts of the codebase. Larger instrumentation coverage may actually make fuzzing less efficient, since it’s more likely to introduce noise: drift in the coverage counters that aren’t meaningful control-flow changes that we care about.

Now, it’s time to put fuzzMe in a complete module:

module Test (test) where

import qualified Data.ByteString.Unsafe as BS
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Foreign
import Foreign.C
import GHC.IO

test :: Ptr CChar -> CSize -> IO Int
test p len = catchAny (do
  s <- BS.unsafePackCStringLen (p, fromIntegral len)
  fuzzMe $ T.unpack $ T.decodeUtf8Lenient s
  pure 0) (\_ -> pure 1)

foreign export ccall test :: Ptr CChar -> CSize -> IO Int

We’ve wrapped fuzzMe into a test function that takes a buffer as input, and outputs 1 if and only if it’s an interesting case that results in a crash. We don’t care about UTF-8 decoding failures, therefore we use decodeUtf8Lenient. The test function is exported as a C function, enabling us to call it in the C glue code:

#include <Rts.h>

STATIC_INLINE void hs_init_with_rtsopts_(char *argv[]) {
  int argc;
  for (argc = 0; argv[argc] != NULL; ++argc) {
  }
  hs_init_with_rtsopts(&argc, &argv);
}

int LLVMFuzzerInitialize(int *argc, char ***argv) {
  char *my_argv[] = {"my_program_name", "+RTS", "--install-signal-handlers=no", "-V0", "-H64m", "-RTS", NULL};
  hs_init_with_rtsopts_(my_argv);
  return 0;
}

extern HsInt test(HsPtr, HsWord);

int
LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  int r = test(Data, Size);
  if(r != 0) { __builtin_trap(); }
  return 0;
}

Some points worth explaining here:

To work with libFuzzer, we need to implement LLVMFuzzerTestOneInput. This function accepts a blob input and crashes if it’s interesting, returning 0 otherwise. You may want to take a look at libFuzzer’s public interface.
Users can also implement LLVMFuzzerInitialize which runs once before the fuzzing loop starts. Here, we use this to initialize the GHC RTS with the correct flags:
- --install-signal-handlers=no, because libFuzzer needs to set up various signal handlers and we don’t want to mess with those.
- -V0, which disables the RTS timer and makes Haskell context switching deterministic. When fuzzing, the less non-determinism, the better.
- -H64m, which allows starting with a large heap size to reduce the chances of garbage collection.
There are a whole bunch of other tricks to tune the RTS flags to maximize performance, but let’s settle for these basics for now.

Now, it’s time to actually compile and link stuff:

$ /tmp/ghc/_build/stage1/bin/ghc -O2 -optc-fsanitize=fuzzer -optl-fsanitize=fuzzer -no-hs-main Test.hs test.c -o test
$ ./test

We pass -fsanitize=fuzzer to clang at compile-time to enable instrumentation, and at link-time to link libFuzzer’s own driver that provides its own main implementation. The resulting test executable can be used to fuzz fuzzMe, accepting command-line flags documented in the libFuzzer documentation.

Go ahead and give it a try! Even when running it single-process and without any corpus of initial test cases, it’ll find the "secret code" within a minute.

Where to go from here

In a larger project, you may want to apply instrumentation to certain cabal packages. This can be achieved by passing ghc-options: -optc-fsanitize=fuzzer for those packages, and just like fuzzMe, you need to write a small C wrapper as a cabal executable component that glues the Haskell world and the libFuzzer driver.

It’s even possible to tell hadrian to instrument parts of boot libs:

$ hadrian/build --flavour=perf+no_profiled_libs+no_dynamic_libs "stage1.base.ghc.hs.opts += -optc-fsanitize=fuzzer-no-link" "stage1.*.ghc.link.opts += -optl-fsanitize=fuzzer-no-link" --docs=none -j

The above command instruments the entire base library, without linking libFuzzer into the final ghc executables. Instrumenting base can be useful when the control-flow we care about goes into base. Reusing the fuzzMe example, if you change it to:

fuzzMe "secret code" = fail "uh oh"

It defeats our default fuzzing setup! If you take a look at the Cmm dump, GHC no longer pattern matches on individual characters; it emits "secret code" as a top-level String and emits a call to GHC.Base.eqString, therefore we need to instrument base to make it work again.

This blog post demonstrates that coverage-guided fuzzers like AFL++ and libFuzzer don’t only benefit low-level languages like C/C++. We can get fuzzing support in Haskell for cheap, merely by messing with GHC build flags. In the future, it may be possible to add the instrumentation logic to GHC directly (or via some kind of backend plugins), and even have nice integration with existing property-based testing frameworks! Let’s wait and see.

Behind the scenes

Cheng Shao

Cheng is a Software Engineer who specializes in the implementation of functional programming languages. He is the project lead and main developer of Tweag's Haskell-to-WebAssembly compiler project codenamed Asterius. He also maintains other Haskell projects and makes contributions to GHC(Glasgow Haskell Compiler). Outside of work, Cheng spends his time exploring Paris and watching anime.

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Announcing halide-haskell - a Haskell interface for the Halide image and array processing language Assumptions for Liquid Haskell in the large →