Tweag
News
Capabilities
Dropdown arrow
Careers
Research
Blog
Contact
Modus Create
News
Capabilities
Dropdown arrow
Careers
Research
Blog
Contact
Modus Create

Accessing external resources reliably with Bazel

2 April 2026 — by Alexey Tereshenkov

We say that a system is reliable if it continues to function correctly when events outside the system affect it. Many factors can impact the reliability of Bazel builds, especially dependencies on external services. In this post, we’ll focus on what can go wrong when your build needs resources you don’t control and what you can do to reduce the risk of build failures.

Depending on external resources

Some build actions triggered by Bazel might be accessing resources that are external to your organization. For Bazel builds, this typically applies to build rules (to build your first-party code) or repository rules (utilities and tools those rules might need). When Bazel starts a build, it emits data about network requests, and you need to make those external requests visible so that you know what external resources your builds depend on. You can access this information via the Build Event Protocol (BEP) which can be written to disk or, if you operate a remote cache service, your provider might have a BEP viewer. You can also use the --experimental_repository_resolved_file flag to produce resolved information about all Starlark repository rules that were executed.

Building a target that depends on a repository rule such as this:

http_archive(
    name = "yq_cli",
    build_file = "@//tools/yq:BUILD.bazel.gen",
    sha256 = "7583d471d9bfe88e32005e9d287952382df0469135f691e044443f610d707f4d",
    url = "https://github.com/mikefarah/yq/releases/download/v4.47.1/yq_linux_amd64.tar.gz",
)

would result in the following build event (the snippet below is copied from the BEP output):

...
children {
  fetch {
    url: "https://github.com/mikefarah/yq/releases/download/v4.47.1/yq_linux_amd64.tar.gz"
  }
}
...

To get an idea of what kinds of artifacts a Bazel build for a reasonably large project might fetch, let’s build a few open-source projects — Envoy, Redpanda, and datadog-agent. These are some of the domains from which at least one resource was fetched when building all targets from these projects:

bcr.bazel.build             cdn.azul.com                dl.google.com
dl.grafana.com              dl.min.io                   download.gnome.org
files.pythonhosted.org      gcr.io                      github.com
go.dev                      mirror.bazel.build          mirrors.kernel.org
raw.githubusercontent.com   pkgconfig.freedesktop.org   pypi.org
static.crates.io            static.rust-lang.org        s3.amazonaws.com
www.antlr.org               www.colm.net                www.lua.org
www.sqlite.org              www.tcpdump.org

While most of your external dependencies are going to be declared in build metadata files such as MODULE.bazel (or legacy WORKSPACE), some network requests are going to be made by build targets such as genrules (e.g., by calling curl) or toolchains (e.g., a pip call to the PyPI index). We’ll see a worked example of this later in the post.

Common problems

In general, it is advised to rely on MODULE.bazel or WORKSPACE mechanisms for accessing external dependencies instead of doing so via build or test actions. Bazel by design lacks support and features for downloads to take place within build actions, and when attempting to interact with external systems this way, you will be limited in how you can manage and account for those requests.

Therefore, when building, the complete list of accessed online resources — those that are accounted for by BEP and those that are not — might be much longer. After doing a full build, it might be helpful to audit the network requests made to discover what resources were fetched and a complete inventory of external hosts your build depends on.

Given these external dependencies, these are common problems that could happen to any of them:

  • Outages: no service provides 100% uptime guarantee and some providers, sadly, have incidents all too often.
  • Removed artifacts: an archive file might be deleted due to retention policy.
  • Rate limiting: many concurrent builds coming from the same cluster can accidentally trigger API or download rate limits, especially with public registries.
  • Checksum drift: content of an artifact at a given URL can change, intentionally or maliciously, causing checksum mismatches.

This post focuses on strategies to either remove these external dependencies from the critical path, or make failures graceful and recoverable.

Remedies

The remedies below are intentionally “stackable”: you can start with low-effort safeguards (e.g., checksums and retries) and progress toward stronger guarantees (e.g., mirrors and network blocking). If you’re skimming, you can pick one external host that concerns you (e.g., github.com or pypi.org) and follow the options that would let you depend on it more reliably.

Using checksums

External resources may not only vanish or become inaccessible, but also change in place. Any artifact you download (unless there’s a strong guarantee from a provider), might change its contents such as when a provider does in-place updates of their releases (or it could also be a malicious attempt to inject code). To prevent this issue, SHA-256 digests must be coupled with any artifact you download from the Internet. Even though when declaring dependencies on external resources such as with http_archive, providing sha256 attribute is optional, it is considered a security risk to omit specifying the SHA-256 for remote files to be fetched.

Using GitHub releases

As the majority of build rules and open-source tools used by projects built with Bazel are hosted on GitHub, there are some special concerns that are worth mentioning.

A public GitHub repository might be moved, deleted, or become private (this happened in 2025 with rules_mypy). If you do have to rely on external rulesets hosted on GitHub, make sure they are hosted under the bazel-contrib organization (or help get them migrated at some point) to avoid surprises.

Checksums of dynamically generated archives might change; this has caused Bazel outages before, in 2023. There was some confusion about whether the stability of archives is guaranteed or not. There might be some edge cases such as when a Git repository is renamed, and since Bazel builds rely on stability of archives (for reproducibility and caching among other reasons) it might be best to play it safe and only use releases instead of using source downloads.

Using retries

It is possible that some of your dependencies need to be obtained from an online resource that is known to be unstable. What’s worse, you may not even be able to cache it (or host yourself): for example, imagine needing to download a short-lived license file for a commercial product from the manufacturer’s server when starting a build. To make downloading this file (via a repository rule) more likely to succeed, consider using the --experimental_repository_downloader_retries flag to specify the maximum number of attempts to retry upon a download error.

Placing binaries under version control

This varies a lot between organizations and the programming languages concerned, but a common approach that is adopted by most organizations is to check in the source code that is used to build a binary, and not the binary itself.

Many engineers would be strongly opposed to checking in any binary, as Version Control Systems (VCS) are designed and optimised for managing the source code. However, it is known that some organizations choose to place binary libraries that are external dependencies of their first-party code under version control. This has been seen occasionally in Java projects where .jar libraries (that nowadays can be managed with Maven / Gradle) were checked in. Today, this, arguably, might make sense only for legacy projects, air-gapped or classified networks, and for vendored native libraries that are hard to rebuild.

Unless you are able to provide top-notch automation for keeping your third-party dependencies checked in under version control up-to-date, patched, and compliant with any licensing constraints, it might be best to rely on a private artifact cache for hosting third-party dependencies.

Internal repository manager

As your organization grows, you will likely need to invest in a tool that would allow you to organize your resources such as external tools and third-party code packages into repositories. There are lots of commercial solutions on the market such as JFrog Artifactory, Sonatype Nexus, AWS CodeArtifact, and GitLab package registry to name a few.

With a repository manager, once you discover a dependency on an external artifact, you would upload it manually in your internal binary repository and update your build metadata accordingly:

# MODULE.bazel
http_archive(
  name = "tool",
  ...
  urls = [
    "https://artifacts.company.com/artifactory/project/tools/tool-1.2.3.tar.gz",
    "https://www.project.org/source/1.2.3/tool-1.2.3.tar.gz",
  ]
)

URLs from the urls attribute are tried in order until one succeeds. It is recommended to specify the local binary repository artifact first, and if the hosted mirror happens to be down, your build would still succeed provided that, in this case, project.org is up and running.

Bazel downloader configuration

You could also let your binary repository manager be the only place where Bazel builds can fetch resources from if you don’t want to depend on external artifacts in any way at all. This can be achieved by providing a configuration file for the remote downloader using the --downloader_config flag.

For example, a simple use case may be to block GitHub and instead rewrite fetches to go to an Artifactory instance. This can be done with the following downloader configuration:

rewrite github.com/([^/]+)/([^/]+)/releases/download/([^/]+)/(.*) artifacts.my-company.com/artifactory/github-releases-mirror/$1/$2/releases/download/$3/$4

# if you still have to rely on dynamically generated archives instead of releases
rewrite github.com/([^/]+)/([^/]+)/archive/(.+).(tar.gz|zip) artifacts.my-company.com/artifactory/github-releases-mirror/$1/$2/archive/$3.$4

However, support for using Bazel’s downloader needs to be enabled in Bazel rulesets by their authors. For instance, in rules_python, the pip extension now supports pulling information from a PyPI compatible mirror which means that the Bazel downloader can be used for downloading Python wheels.

Take a look at some downloader configurations used in other projects (e.g., 1, 2, 3) to explore how others set up access to external resources and learn the nuances of the configuration declaration syntax.

Blocking network requests

Additional control of network access can be achieved by blocking some network requests in CI agents using custom firewall rules or other tools of that nature. However, as mentioned earlier, Bazel’s downloader configuration can only rewrite or block requests that Bazel is aware of. This means that not all network traffic in a Bazel build is Bazel-managed traffic.

To illustrate this, let’s declare a dependency on the gawk binary. When running gawk, its sources are going to be fetched from the GNU FTP server. Let’s also add a genrule that will download an archive from the same FTP server:

# MODULE.bazel
bazel_dep(name = "gawk", version = "5.3.2")

# BUILD.bazel
genrule(
    name = "diffutils",
    outs = ["diffutils-3.12.tar.xz"],
    cmd = """wget -O "$@" https://ftp.gnu.org/gnu/diffutils/diffutils-3.12.tar.xz""",
)

We’ll configure Bazel to use a downloader configuration that blocks fetches from that FTP server:

# bazel_downloader.cfg
block ftp.gnu.org

# .bazelrc
common --downloader_config=bazel_downloader.cfg

When attempting to run the gawk binary from the ruleset, an error is expectedly raised since accessing the server is blocked:

$ bazel run @gawk
...
ERROR: java.io.IOException: Configured URL rewriter blocked all URLs:
[https://ftp.gnu.org/gnu/gawk/gawk-5.3.2.tar.xz]

However, building a genrule still succeeds because the downloader configuration does not apply here:

$ bazel build //src:diffutils
...
INFO: From Executing genrule //src:diffutils:
--2026-01-19 10:48:54--  https://ftp.gnu.org/gnu/diffutils/diffutils-3.12.tar.xz
Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b
Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Saving to: 'bazel-out/k8-fastbuild/bin/src/diffutils-3.12.tar.xz'

External network requests of this nature are hard to audit in a large codebase since they won’t show up as structured fetch events in BEP output. To mitigate this, prefer using repository rules and Bzlmod extensions for any downloads instead of ad hoc shell commands. Going a step further, you might want to consider forbidding direct calls to applications that might make network requests (such as curl or wget) in genrule targets, unless explicitly approved. Where unavoidable, configure targets to access internal repositories instead of public endpoints.

Sandboxing

When triggering builds in a Bazel sandbox, they are run in a container (using Linux Namespaces) to isolate the build actions from the host. In addition to making your entire filesystem read-only (except for the sandbox directory), you can also forbid actions access the network. This is useful in some scenarios when you want to confirm that a build doesn’t make any network requests such as when running unit tests or integration tests that are not supposed to make any network calls. See Bazel tags requires-network and block-network to learn how to control network access for individual build targets.

Keep in mind that cached results of build actions can still be fetched even when blocking the network in a sandbox. So if artifacts needed for a build were uploaded to the Bazel cache previously, you won’t know whether a particular build needs any network resources unless you run the build without cache access. Also, none of the sandbox flags affect any cache as it’s expected that these flags should not affect the output of hermetic actions and making them part of a cache key would worsen the effectiveness of the cache.

With the network disabled in a sandbox, the genrule target we declared earlier fails to build:

$ bazel build //src:diffutils --spawn_strategy=linux-sandbox --nosandbox_default_allow_network
...
ERROR: Executing genrule //src:diffutils failed: (Exit 4): bash failed: ...
Resolving ftp.gnu.org (ftp.gnu.org)... failed: Temporary failure in name resolution.
wget: unable to resolve host address 'ftp.gnu.org'
Target //src:diffutils failed to build
...

Mirrors

Since Bazel 8.4, you can also use the --module_mirrors flag to mirror the source archives. To take advantage of this, add --module_mirrors=https://bcr.cloudflaremirrors.com in your .bazelrc file. Keep in mind that this only applies to registry sources and not to other resources fetched by Bazel (such as downloads happening in the repository rules context).

Note that for Bazel builds, the Bazel Central Registry (BCR) only stores metadata for a Bazel module; the actual artifacts are usually fetched from URLs that point to files hosted online (most often on GitHub).

BCR itself is a sort of external dependency for your builds, too. Even though it’s hosted on production-grade infrastructure at Google, it can still be impacted by outages and operational mishaps. The SSL certificate for mirror.bazel.build has expired, causing worldwide CI breakages, at least twice: once in 2022 and again in 2025. Refer to Postmortem for bazel.build SSL certificate expiry to learn more.

Configuring Bazel to use https://bcr.cloudflaremirrors.com as a mirror for modules from the BCR helps, but the Cloudflare mirror doesn’t cover the registry itself. So if you want to go the extra mile, you might also consider setting up your own BCR index registry and point Bazel at that instead. But if this is not feasible, write a playbook for incident response around build outages caused by external dependencies, so teams don’t have to improvise under pressure.

Pull-through cache

If your repository manager supports it, you could let your builds download external resources, but every resource that is being fetched is saved into the cache as well. On subsequent builds, the resources are going to be fetched from the cache, if available. This would let you turn random external downloads into a controlled internal dependency without requiring you to pre-vendor everything up front.

If your CI agents are in the same network or cloud region (depending on your infrastructure setup), this could also speed up the builds by having downloads complete faster. Not relying on external resources makes your Bazel builds also a lot more secure as your CI agents will only download data from a trusted source.

If using an off-the-shelf solution, such as the popular JFrog Artifactory, is not possible, there are some other options. Bazel picks up proxy addresses from the HTTP_PROXY and HTTPS_PROXY environment variables and uses these to download files over HTTP and HTTPS, respectively (if specified). This means you might have success with caching proxy solutions such as Squid and Charles or by combining Nginx and Varnish HTTP reverse proxies. Routing requests through a proxy might also help to avoid rate limiting issues since the external service will see fewer direct requests.

With this configuration, your downloader configuration file would look something like this:

# point all downloads at the mirror
rewrite (.*) {caching-service-url}/$1

# use the original location if the mirror is down
rewrite (.*) $1

For a completely custom solution, take a look at the Bazel downloader mirror from Monogon which can be used to mirror Bazel dependencies to a cloud bucket storage such as S3 or GCS. Bazel’s remote asset API lets you use an existing remote cache (content-addressable storage: CAS) as a downloader cache as well. The cache provider service needs to support it, but many existing solutions, both commercial and open-source ones, are compatible.

The --experimental_remote_downloader flag can be specified to provide a Remote Asset API endpoint URI to be used as a remote download proxy. To get started, consider using bazel-remote, which has out-of-the-box support for this use case. Make sure to provide the sha256 for the assets to fetch so that they can be cached just like any other CAS object. A remote caching service will automatically download the assets from the URL if they are found in the CAS and cache it thereafter.

Bazel 9 adds support for remote repository caches which make Bazel builds (at least those requiring previously cached assets) extra resilient to external access issues. During outages of external hosting services, those organizations that didn’t have a central repository manager where repository rules artifacts could be stored had to extract files from cache directories on local developer machines and save them to an accessible location within the internal network.

Now these artifacts will be saved into a remote cache similarly to build output results. To confirm that your remote repository cache works as expected, you can use the --repository_disable_download flag after doing a clean build (which should succeed as it will reuse the remote cache entries uploaded in the previous build).

Chaos testing

Finally, instead of waiting for the next GitHub outage, you can test your resilience by intentionally breaking access to certain external hosts. In a staging CI environment, temporarily block access to key external systems with firewall rules and verify that your mirrors and caches are used as expected, builds either still succeed, or fail fast with clear error messages, and your runbooks are correct and sufficient.

Conclusion

Bazel projects often depend on external services in subtle ways, and any instability or change in those services can break otherwise healthy builds. You can significantly improve build reliability by making all downloads explicit and verifiable, routing them through managed infrastructure, and tightening how and when network access is allowed. Resilient Bazel builds come from treating external dependencies as first‑class operational risks and turning unpredictable third‑party failures into controlled, recoverable events.

Behind the scenes

Alexey Tereshenkov

Alexey is a build systems software engineer who cares about code quality, engineering productivity, and developer experience.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

Company

AboutOpen SourceCareersContact Us

Connect with us

© 2025 Modus Create, LLC

Privacy PolicySitemap