Convert Cabal-based projects to Bazel automatically

28 July 2021 — by Facundo Domínguez, Andreas Herrmann

If you have a large Haskell code base, organized in multiple Cabal packages, with many system dependencies, and which takes very long to build, then this post is for you. We describe herein gazelle_cabal, a new tool that generates Haskell rules to build with the Bazel build tool. It saves the maintainer the trouble of writing these rules to begin with, and then keeping them synchronized with the Cabal files whenever they are modified.

Developer teams who want to make a gradual transition to Bazel can continue to specify builds via Cabal files while still using Bazel to build their artifacts.

Why Bazel?

Bazel is known to offer caching for builds of multi-language projects. Once the cache is hot, builds can avoid rebuilding many intermediate artifacts, shortening the overall build times. This reduces the time and the cost of running continuous integration systems and improves developer productivity by decreasing the time it takes to rebuild when switching branches of the project in a versioned control system.

Some support for incremental builds is implemented by most modern build tools, cabal-install and stack being no exception. However, the tools specialized for Haskell have poor support for working with dependencies written in other languages, and they readily discard old artifacts when rebuilding them, which entails subsequent rebuilds when reverting changes.

Moving to Bazel is often a major investment for an existing project, though. The recipes to build each artifact need to be rewritten, and the ways of the new build system need to be learned. The gazelle_cabal tool helps with some of that effort by extending the gazelle tool, which provides infrastructure to generate and update Bazel configuration files in general.

Generating rules

In the happy path, one configures gazelle_cabal in a given repo, and then invokes

$ bazel run //:gazelle

The above will generate BUILD.bazel files next to each Cabal file, containing the rules necessary to build the various components of the Cabal package.

If the Cabal file reads

cabal-version:      2.4
name:               package-a
version:            0.1.0.0
...


library
    ...
executable executable-a
    ...
test-suite test-a
    ...
benchmark bench-a
    ...

The BUILD.bazel file will look like

haskell_library(
    name = "package-a",
    ...
)
haskell_binary(
    name = "executable-a",
    ...
)
haskell_test(
    name = "test-a",
    ...
)
haskell_binary(
    name = "bench-a",
    ...
)

Additionally, one could invoke the following command to declare in the WORKSPACE file all of the Haskell dependencies that the Cabal packages need.

$ bazel run //:gazelle-update-repos

Which generates the following rule in the WORKSPACE file.

stack_snapshot(
    name = "stackage",
    components = {
        "tasty-discover": [ "lib", "exe:tasty-discover" ],
        ...
    },
    packages = [
        "base",
        "tasty",
        "tasty-discover",
        "tasty-hunit",
        "void",
        ...
    ],
)

In this case, the user is expected to add other necessary attributes. For instance, the stack_snapshot rule requires either a snapshot or a local_snapshot attribute.

Updating rules

Unlike other file generators, gazelle_cabal and gazelle extensions in general don’t overwrite the generated files on subsequent runs. They rather blend updates to the rules with the contents of the existing files. This simplifies considerably the customization of the output, which otherwise would need to be specified with command line flags or additional configuration files.

As an example, suppose we want to skip building a library on some particular configuration. One way to deal with that in Bazel is to specify a tag.

haskell_library(
    name = "package-a",
    srcs = ...,
    tags = ["skip-ci"],
)

The next time that gazelle_cabal runs, it may modify other attributes, but it will know to preserve the tags attribute and any other attribute that doesn’t need an update.

Even when attributes need to be updated, some parts of them can still be preserved. A typical example is the package list in the WORKSPACE file.

stack_snapshot(
    name = "stackage",
    ...
    packages = [
        "aeson",  # keep
        "base",
        "inspection-testing",
        "optparse-applicative",  # keep
        "path",  # keep
        "path-io",  # keep
        "tasty",
        "tasty-discover",
        "tasty-hunit",
        "void",
    ],
    snapshot = "lts-18.1",
)

Here, gazelle_cabal is free to add and remove any packages in the packages attribute of the stack_snapshot rule as long as they aren’t marked with a #keep comment. When a #keep comment is used, the corresponding package name needs to be retained in all updates. And this applies to any list of strings or labels in any attribute managed by the tool.

In the case of stack_snapshot this is handy to manage dependency lists, where some packages are required by Cabal files, and some packages are required by non-managed Haskell rules in the same repository.

Also, note the snapshot attribute above, which is required by the stack_snapshot rule and gazelle_cabal leaves untouched.

Long time users of rules_haskell may feel reminded of Hazel, which was the tool used to import Stackage dependencies into Bazel builds before it was replaced by the Cabal rules.

Hazel was similar to gazelle_cabal in that it parsed Cabal files and generated haskell_library or haskell_binary targets to build these Cabal packages with Bazel. However, Hazel was intended for a different use-case. Namely, importing external dependencies into the project and building them with Bazel. This meant that it had to be able to fully automatically generate working build definitions for all required external dependencies.

Many packages are easy to translate, however, some packages make use of advanced Cabal features such as custom setup scripts or configure scripts in ways that can be difficult to translate fully automatically. The Cabal rules avoid these issues by building external dependencies with Cabal, meaning no conversion is required. They are still the recommended way to build external Haskell dependencies.

This Gazelle extension, on the other hand, is intended for Cabal packages situated in your code base, where changes to the Cabal file or code for compatibility are more convenient to make. As mentioned before, Gazelle can also preserve user defined adjustments to generated Bazel rules when needed.

Of course this raises the question, why not just use the Cabal rules for these packages?

Firstly, if using Cabal rules, it would still be up to the user to write on each rule the list of Haskell dependencies that each package needs to build, whereas gazelle_cabal can take care of that.

Also, the regular Haskell rules are better suited for the main code base than the Cabal rules. For example, only regular Haskell rules can be loaded into GHCi by source using haskell_repl. Cabal rules, in contrast, can only be loaded as precompiled libraries.

Regular Haskell rules can also generate finer-grained actions providing faster incremental builds. The Cabal rules, on the other hand, generate large monolithic actions (e.g. generating haddock documentation together with linking static and dynamic libraries) and have to do additional work for compatibility between Bazel and Cabal. This overhead is not a big issue for third party dependencies that are rarely changed and most often fetched from cache. However, it is an issue for targets that are changed frequently.

Closing remarks

This project was possible thanks to the generous funding from Symbiont and their test bed for the initial implementation. We made a case of gazelle_cabal as a tool to help development teams transitioning to Bazel builds. As the tool is adopted by other projects, we look forward to receiving contributions and smoothing the user experience.

Behind the scenes

Facundo Domínguez

Facundo is a software engineer supporting development and research projects at Tweag. Prior to joining Tweag, he worked in academia and in industry, on a varied assortment of domains, with an overarching interest in programming languages.

Andreas Herrmann

Andreas is a physicist turned software engineer. He leads the Bazel team, and maintains Tweag's open source Bazel rule sets and the capability package. He is passionate about functional programming, and hermetic and reproducible builds. He lives in Zurich and is active in the local Haskell community.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.