Integrating Coverity static analysis with Bazel

5 February 2026 — by Mark Karpov, Alexey Tereshenkov

Coverity is a proprietary static code analysis tool that can be used to find code quality defects in large-scale, complex software. It supports a number of languages and frameworks and has been trusted to ensure compliance with various standards such as MISRA, AUTOSAR, ISO 26262, and others. Coverity provides integrations with several build systems, including Bazel, however the official Bazel integration fell short of the expectations of our client, who wanted to leverage the Bazel remote cache in order to speed up Coverity analysis and be able to run it in normal merge request workflows. We took on that challenge.

The Coverity workflow

To understand the rest of the post it is useful to first become familiar with the Coverity workflow, which is largely linear and can be summarized as a sequence of steps:

Configuration. In this step cov-configure is invoked, possibly several times with various combinations of key compiler flags (such as -nostdinc++, -fPIC, -std=c++14, and others). This produces a top-level XML configuration file accompanied by a directory tree that contains configurations that correspond to the key flags that were provided. Coverity is then able to pick the right configuration by dispatching on those key flags when it sees them during the build.
Compilation. This is typically done by invoking cov-build or its lower-level cousin cov-translate. The distinction between these two utilities will become clear in the next section. For now, it is enough to point out that these tools translate original invocations of the compiler to invocations of cov-emit (Coverity’s own compiler) which in turn populate an SQLite database with intermediate representations and all the metadata about the built sources such as symbols and include paths.
Analysis. This step amounts to one or more invocations of cov-analyze. This utility is practically a black box, and it can take a considerable time to run, depending on the number of translation units (think compilation units in C/C++). The input for the analysis is the SQLite database that was produced in the previous step. cov-analyze populates a directory tree that can then be used for report generation or uploading of identified software defects.
Reporting. Listing and registration of found defects is performed with cov-format-errors and cov-commit-defects, respectively. These tools require a directory structure that was created by cov-analyze in the previous step.

Existing integrations

A successful integration of Coverity into Bazel has to provide a way to perform step 2 of the workflow, since this is the only step that is intertwined with the build system. Step 1 is relatively quick, and it does not make a lot of difference how it is done. Step 3 is opaque and build-system-agnostic because it happens after the build, furthermore there is no obvious way to improve its granularity and cacheability. Step 4 is similar in that regard.

The official integration works by wrapping a Bazel invocation with cov-build:

$ cov-build --dir /path/to/idir --bazel bazel build //:my-bazel-target

Here, --dir specifies the intermediate directory where the SQLite database along with some metadata will be stored. In the --bazel mode of operation cov-build tricks Bazel by intercepting the invocations of the compiler which are replaced by invocations of cov-translate. Compared to cov-build, which is a high-level wrapper, cov-translate corresponds more closely to individual invocations of a compiler. It identifies the right configuration from the collection of configurations created in step 1 and then uses it to convert the command line arguments of the original compiler into the command line arguments of cov-emit, which it then invokes.

The main problem with the official integration is the fact that it does not support caching. bazel build has to run from scratch with caching disabled in order to make sure that all invocations of the compiler are performed and none are skipped. Another nuance of the official integration is that one has to build a target of the supported kind, e.g. cc_library. If you bundle your build products together in some way (e.g. in an archive), you cannot simply build the top-level bundle as you normally would. Instead, you need to identify every compatible target of interest in some other way.

Because of this, our client did not use the official Coverity integration for Bazel. Instead, they would run Bazel with the --subcommands option which makes Bazel print how it invokes all tools that participate in the build. This long log would then be parsed and converted into a Makefile in order to be able to leverage Coverity’s Make integration in cov-build instead. This approach still suffered from long execution times due to lack of caching. It ran as a nightly build and took over 12 hours, which wasn’t suitable for a regular merge request’s CI pipeline.

Our approach

The key insight that allowed us to make the build process cacheable is the observation that individual invocations of cov-translate produce SQLite databases—emit-dbs—that are granular and relatively small. These individual emit-dbs can then be merged in order to form the final, big emit-db that can be used for running cov-analyze. Therefore, our plan was the following:

Create a special-purpose Bazel toolchain that starts as a copy of the toolchain which is used for “normal” compilation in order to match the way the original compiler (gcc in our case) is invoked.
Instead of gcc and some other tools such as ar, have it invoke our own wrapper that would drive invocations of cov-translate.
The only useful output of the compilation step is the emit-db database. On the other hand, in CppCompile actions, Bazel normally expects .o files to be produced, so we just rename our emit-db SQLite database to whatever object file Bazel is expecting.
In the linking step, we use the add subcommand of cov-manage-emit in order to merge individual emit-dbs.
Once the final bundle is built, we iterate through all eligible database files and merge them together once again in order to obtain the final emit-db that will be used for analysis.

From Bazel’s point of view, we are simply compiling C/C++ code, however, this is a way to perform a cacheable Coverity build. If you are a regular reader of our blog, you may have noticed certain similarities to the approach we used in our CTC++ Bazel integration documented here and here.

Difficulties

The plan may sound simple, but in practice there was nothing simple about it. The key difficulty was the fact that Coverity tooling was not designed with reproducibility in mind. At every step, starting from configuration generation (which is simply a genrule in our case) we had to inspect and adjust produced outputs in order to make sure that they do not include any volatile data that could compromise reproducibility. The key output in the Coverity workflow, the emit-db database, contained a number of pieces of volatile information:

Start time, end time, and process IDs of cov-emit invocations.
Timestamps of source files.
Command line IDs of translation units.
Host name; luckily, this can be overwritten easily by setting the COV_HOST environment variable.
Absolute file names of source files. These are split into path segments with each segment stored as a separate database entry, meaning the number of entries in the FileName table varies with the depth of the path to the execroot, breaking hermeticity. Deleting entries is not viable since their IDs are referenced elsewhere. Our solution was to identify the variable prefix leading to the execroot and reset all segments in that prefix to a fixed string. The prefix length proved to be constant for each execution mode, allowing us to use --strip-path arguments in cov-analyze to remove them during analysis.
In certain cases some files from the include directories would be added to the database with altered Coverity-generated contents that would also include absolute paths.

We had to meticulously overwrite all of these, which was only possible because emit-db is in the SQLite format and there is open source tooling that makes it possible to edit it. If the output of cov-emit were in a proprietary format we almost certainly wouldn’t be able to deliver as efficient a solution for our client.

In practice, normalization of emit-db happens in two stages:

We run some SQL commands that reset the volatile pieces of data inside the database. As we were iterating on these “corrective instructions”, we made sure that we eliminated all instances of volatility by using the sqldiff utility which can print differences in schema and data between tables.
We dump the resulting database with the .dump command which exports the SQL statements necessary to recreate the database with the same schema and data. Then we re-load these statements and thus obtain a binary database file that is bit-by-bit reproducible. This is necessary because simply editing a database by running SQL commands on it does not ensure that the result is binary reproducible even if there is no difference in the actual contents.

Performance considerations

Since emit-dbs are considerably larger than typical object files, we found it highly desirable to use compression for individual SQLite databases, both for those that result from initial cov-translate invocations and for merged databases that are created by cov-manage-emit in the linking step. Zstandard proved to be a good choice for that—the utility is both fast and makes our build outputs up to 4 times smaller. Without compression, we risked filling the remote cache quickly; besides, the bigger the files, the slower I/O operations are.

We were tempted to minimize the size of the database even further by exploring if there’s anything in emit-db that can be safely removed without affecting our client’s use case. Alas, every piece of information stored was required by Coverity during the analysis phase and our attempts to truncate some of the tables led to failures in cov-analyze. It is worth noting that the sqlite3_analyzer utility tool (part of the SQLite project) can be used to produce a report that explains which objects store most data. This way we found that there is an index that contributes about 20% of the database size, however, deleting it severely degrades the performance of record inserts which is a key operation during merging of emit-dbs.

Linking steps, which in our approach amount to merging of emit-db databases, produce particularly large files. In order to reduce the amount of I/O we perform and avoid filling the remote cache too quickly, we’ve marked all operations that deal with these large files as no-remote. This is accomplished with this line in our .bazelrc file:

common:coverity --modify_execution_info=CppArchive=+no-remote,CppLink=+no-remote,CcStrip=+no-remote
# If you wish to disable RE for compilation actions with coverity, uncomment
# the following line:
# common:coverity --modify_execution_info=CppCompile=+no-remote-exec

On the other hand, we did end up running CppCompile actions with remote execution (RE) because it proved to be twice faster than running them on CI agents. Making RE possible required us to identify the exact collection of files from the Coverity installation that are required during invocations of the Coverity tooling. Once we observed RE working correctly, we were confident that the Bazel definitions we use are hermetic.

Merging of individual emit-db databases found in the final bundle (i.e. after the build has finished) proved to be time-consuming. This operation cannot be parallelized since database information cannot be written in parallel. The time required for this step grows linearly with the number of translation units (TUs) being inserted, therefore it makes sense to pick the largest database and then merge smaller ones into it. One could entertain the possibility of skipping merging altogether and instead running smaller analyses on individual emit-dbs, but this seems to be not advisable, since Coverity performs a whole-program analysis, and thus it would lose valuable information that way. For example, one TU may be exercised in different ways by two different applications and the analysis results for this TU cannot be correctly merged.

The analysis phase is a black box, and it can easily become the bottleneck of the entire workflow, thus making it impractical for running in merge request pipelines. A common solution for speeding up the analysis in merge request pipelines is to identify files that were edited and limit the analysis to only these files with the --tu-pattern option, which supports a simple language for telling Coverity what to care about during analysis. We added support for this approach to our solution by automatically finding the files changed in the current merge request, and passing these on to --tu-pattern. This restricted analysis still requires the emit-dbs for the entire project, but most of them will be cached.

The results

The solution that we delivered is in the form of a bazel run-able target that depends on the binary bundle that needs to be analyzed. It can be invoked like this:

bazel run //bazel/toolchains/coverity:workflow --config=coverity -- ...

This solution can be used both for the nightly runs of Coverity and in the merge request pipelines. We have confirmed that the results that are produced by our solution match the findings of Coverity when it is run in the “traditional” way. A typical run in a merge request pipeline takes about 22 minutes when a couple of C++ files are edited. The time is distributed as follows:

8 minutes: building, similar to build times for normal builds (this step is sped up by caching)
10 minutes: the final merging of emit-dbs
4 minutes: analysis, uploading defects, reporting (this step is sped up by --tu-pattern).

The execution time can grow, of course, if the edits are more substantial. The key benefit of our approach is that the Coverity build is now cacheable and therefore can be included in merge request CI pipelines.

Conclusion

In summary, integrating Coverity static analysis with Bazel in a cacheable, reproducible, and efficient manner required a deep understanding of both Bazel and Coverity, as well as a willingness to address the nuances of proprietary tooling that got in the way. By leveraging granular emit-db databases, normalizing volatile data, and optimizing for remote execution and compression, we were able to deliver a solution that fits well into the client’s CI workflow and supports incremental analysis in merge request pipelines.

While the process involved overcoming significant challenges, particularly around reproducibility and performance, the resulting workflow enables fast static analysis without sacrificing the benefits of Bazel’s remote cache. We hope that sharing our approach will help other teams facing similar challenges and inspire improvements when integrating other static analysis tools with Bazel.

Behind the scenes

Mark Karpov

Mark is a build system expert with a particular focus on Bazel. As a consultant at Tweag he has worked with a number of large and well-known companies that use Bazel or decided to migrate to it. Other than build systems, Mark's background is in functional programming and in particular Haskell. His personal projects include high-profile Haskell libraries, tutorials, and a technical blog.

Alexey Tereshenkov

Alexey is a build systems software engineer who cares about code quality, engineering productivity, and developer experience.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.