If you’re a Haskell developer, it’s likely you’re already familiar with the concept of property-based testing, and have first hand experience with a framework like QuickCheck or hedgehog. You might also have heard of the term “fuzzing” in various places, and it sounds just like what we’ve already been doing for so long, right? Just generate a bunch of random test inputs, in the hope of exposing an unknown bug?
Well, actually there’s more to it than that! Recently, I had some fun exploring coverage-guided fuzzers like AFL++ and libFuzzer. Along the way, I discovered a simple trick that allows us to compile Haskell code in a manner that these fuzzers can handle. The experience was akin to unlocking a hidden skill. I can say, without a doubt, that coverage-guided fuzzing can work wonders. This blog post shall guide you through fuzzing a first example…in Haskell!
Special thanks to Jade Lovelace for her discussion on the fediverse with me, without whom this blog post would probably not have seen the light of day.
What the actual fuzz?
At a high level, the job of fuzzers might seem quite similar to that of QuickCheck: they both generate a multitude of random test inputs for your code. There are subtle differences though:
- QuickCheck generates test inputs as Haskell values directly to test a Haskell function. The interfaces of these fuzzers are more generic and low-level, generating only blobs of data that are passed either through stdin to your program or as arguments to your functions.
- QuickCheck uses a Haskell function to express a property. Consequently, any “interesting” behavior worth reporting is a violation of these user-specified properties at runtime. However, for fuzzers like those mentioned above, an “interesting” behavior is synonymous with the program “crashing”.
The differences above may not matter much. The code being fuzzed can deserialize the input blob to get a structured value, and it can crash if and only if a user-specified property is violated.
You might be wondering, if we already have QuickCheck, what’s the point of this blog post? Now, consider the following Haskell function (borrowed from its OCaml sibling):
fuzzMe :: String -> IO ()
fuzzMe ('s':'e':'c':'r':'e':'t':' ':'c':'o':'d':'e':[]) = fail "uh oh"
fuzzMe _ = pure ()
While the input space is huge, only one specific input string would
trigger the bug. In this case, QuickCheck wouldn’t help much, the
chance that "secret code"
occurs among the generated cases is close
to zero.
Coverage-guided fuzzing
Property-based testing frameworks like QuickCheck are actually
examples of “black-box” fuzzing. The fuzzer knows nothing about the
program being tested, and only observes program behavior by their
outputs. The pros and cons of black-box fuzzing are clear: it’s easy
to implement, easy to plug into user code, and proven useful in the
field. However, as the fuzzMe
example above demonstrates, finding
interesting test inputs via black-box fuzzing is sometimes nearly
impossible.
What if the program is no longer a black box to the fuzzer? In addition to observing what output is generated for a particular input, the fuzzer also observes how the program produces that output. For instance, it might observe a complete control-flow log that shows what functions are called and in what order. This is “grey-box” fuzzing as implemented by fuzzers like AFL++ and libFuzzer.
Now, there’s a tradeoff here. If the program being fuzzed spends more time bookkeeping its execution state more accurately, fewer test cases get run per second. In practice, grey-box fuzzers perform coverage-guided fuzzing:
- At compile-time, the compiler performs lightweight instrumentation on user code. Unique counters are assigned to locations like function entry/exit points and control-flow graph edges. The compiler inserts code that increments the counter whenever that path is taken at runtime.
- Normally, the coverage-related counters are used to generate a detailed HTML report that shows how much of your code is actually covered by your test suite. However, the fuzzer can also use them as an inexpensive approximation of the complete control-flow log.
- The fuzzer generates new cases by mutating existing ones. As long as a new case makes the program take a new execution path, the fuzzer will notice it and use it as a basis for generating future cases.
Can coverage-guided fuzzing crack the fuzzMe
nut? Conceptually yes.
Among the randomly generated test cases, it’s unlikely that "secret code"
will appear at first, but there will be cases that begin with
's'
. Control-flow is vastly different between non-'s'
cases and
the 's'
cases due to pattern matching in fuzzMe
, the fuzzer would
notice this difference and try exploring the 's'
direction, thus
leading to the discovery of 'e'
and so on.
Building a GHC for fuzzing-related instrumentation
As explained in the previous section, a coverage-guided fuzzer requires compile-time instrumentation to be performed on user code. With appropriate command-line flags, the clang compiler does this when compiling C/C++ to LLVM IR, and it’s the basis for fuzzers like AFL++ and libFuzzer.
By default, GHC compiles Haskell to Cmm, then to assembly or to LLVM IR before generating the object file. However, clang can’t instrument these low-level IRs yet. While it’s totally possible for GHC to perform instrumentation in its own backends, no one has implemented this functionality yet.
Fortunately, GHC can be built with an “unregisterised” mode. Under this mode, GHC compiles Cmm to C, then to an object file. This is how we can get the instrumentation we want for cheap!
The rest of this blog post will focus on libFuzzer, which has decent performance under its default settings and is also easier to get started with, since it is a part of the LLVM project and ships with LLVM/Clang releases.
Now, it’s time to kickstart a GHC build and take your coffee break:
$ git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
$ cd ghc
$ ./boot
$ ./configure --enable-unregisterised --with-system-libffi CC=/usr/lib/llvm-16/bin/clang CXX=/usr/lib/llvm-16/bin/clang++ LD=/usr/lib/llvm-16/bin/ld.lld CONF_GCC_LINKER_OPTS_STAGE2="--ld-path=/usr/lib/llvm-16/bin/ld.lld"
$ hadrian/build --flavour=perf+no_profiled_libs --docs=none -j
The options to ./configure
make GHC produce C code then use the clang toolchain to compile and link.
The locations of the executables and libraries is for clang 16 at its usual location; adapt for your version of clang.
If you haven’t built GHC before, don’t worry, the hadrian documentation is a good starting point.
Fuzzing your first Haskell function
The commands above merely build an unregisterised GHC without any instrumentation. That’s right, it is possible and even desirable to only instrument interesting parts of the codebase. Larger instrumentation coverage may actually make fuzzing less efficient, since it’s more likely to introduce noise: drift in the coverage counters that aren’t meaningful control-flow changes that we care about.
Now, it’s time to put fuzzMe
in a complete module:
module Test (test) where
import qualified Data.ByteString.Unsafe as BS
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Foreign
import Foreign.C
import GHC.IO
test :: Ptr CChar -> CSize -> IO Int
test p len = catchAny (do
s <- BS.unsafePackCStringLen (p, fromIntegral len)
fuzzMe $ T.unpack $ T.decodeUtf8Lenient s
pure 0) (\_ -> pure 1)
foreign export ccall test :: Ptr CChar -> CSize -> IO Int
We’ve wrapped fuzzMe
into a test
function that takes a buffer as
input, and outputs 1
if and only if it’s an interesting case that
results in a crash. We don’t care about UTF-8 decoding failures,
therefore we use decodeUtf8Lenient
. The test
function is exported
as a C function, enabling us to call it in the C glue code:
#include <Rts.h>
STATIC_INLINE void hs_init_with_rtsopts_(char *argv[]) {
int argc;
for (argc = 0; argv[argc] != NULL; ++argc) {
}
hs_init_with_rtsopts(&argc, &argv);
}
int LLVMFuzzerInitialize(int *argc, char ***argv) {
char *my_argv[] = {"my_program_name", "+RTS", "--install-signal-handlers=no", "-V0", "-H64m", "-RTS", NULL};
hs_init_with_rtsopts_(my_argv);
return 0;
}
extern HsInt test(HsPtr, HsWord);
int
LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
int r = test(Data, Size);
if(r != 0) { __builtin_trap(); }
return 0;
}
Some points worth explaining here:
-
To work with libFuzzer, we need to implement
LLVMFuzzerTestOneInput
. This function accepts a blob input and crashes if it’s interesting, returning 0 otherwise. You may want to take a look at libFuzzer’s public interface. -
Users can also implement
LLVMFuzzerInitialize
which runs once before the fuzzing loop starts. Here, we use this to initialize the GHC RTS with the correct flags:--install-signal-handlers=no
, because libFuzzer needs to set up various signal handlers and we don’t want to mess with those.-V0
, which disables the RTS timer and makes Haskell context switching deterministic. When fuzzing, the less non-determinism, the better.-H64m
, which allows starting with a large heap size to reduce the chances of garbage collection.
There are a whole bunch of other tricks to tune the RTS flags to maximize performance, but let’s settle for these basics for now.
Now, it’s time to actually compile and link stuff:
$ /tmp/ghc/_build/stage1/bin/ghc -O2 -optc-fsanitize=fuzzer -optl-fsanitize=fuzzer -no-hs-main Test.hs test.c -o test
$ ./test
We pass -fsanitize=fuzzer
to clang at compile-time to enable
instrumentation, and at link-time to link libFuzzer’s own driver that
provides its own main
implementation. The resulting test
executable can be used to fuzz fuzzMe
, accepting command-line flags
documented in the libFuzzer documentation.
Go ahead and give it a try! Even when running it single-process and
without any corpus of initial test cases, it’ll find the "secret code"
within a minute.
Where to go from here
In a larger project, you may want to apply instrumentation to certain
cabal packages. This can be achieved by passing ghc-options: -optc-fsanitize=fuzzer
for those packages, and just like fuzzMe
,
you need to write a small C wrapper as a cabal executable component
that glues the Haskell world and the libFuzzer driver.
It’s even possible to tell hadrian to instrument parts of boot libs:
$ hadrian/build --flavour=perf+no_profiled_libs+no_dynamic_libs "stage1.base.ghc.hs.opts += -optc-fsanitize=fuzzer-no-link" "stage1.*.ghc.link.opts += -optl-fsanitize=fuzzer-no-link" --docs=none -j
The above command instruments the entire base
library, without
linking libFuzzer
into the final ghc
executables. Instrumenting
base
can be useful when the control-flow we care about goes into
base
. Reusing the fuzzMe
example, if you change it to:
fuzzMe "secret code" = fail "uh oh"
It defeats our default fuzzing setup! If you take a look at the Cmm
dump, GHC no longer pattern matches on individual characters; it emits
"secret code"
as a top-level String
and emits a call to
GHC.Base.eqString
, therefore we need to instrument base
to make it
work again.
This blog post demonstrates that coverage-guided fuzzers like AFL++ and libFuzzer don’t only benefit low-level languages like C/C++. We can get fuzzing support in Haskell for cheap, merely by messing with GHC build flags. In the future, it may be possible to add the instrumentation logic to GHC directly (or via some kind of backend plugins), and even have nice integration with existing property-based testing frameworks! Let’s wait and see.
About the author
Cheng is a Software Engineer who specializes in the implementation of functional programming languages. He is the project lead and main developer of Tweag's Haskell-to-WebAssembly compiler project codenamed Asterius. He also maintains other Haskell projects and makes contributions to GHC(Glasgow Haskell Compiler). Outside of work, Cheng spends his time exploring Paris and watching anime.
If you enjoyed this article, you might be interested in joining the Tweag team.