Announcing Skyscope

4 May 2023 — by Ben Radford

Skyscope is a new tool from the Scalable Build Systems team at Tweag. You can use it to visualise and explore Bazel build graphs in your web browser. More specifically, it lets you import a snapshot of a Skyframe graph (which might contain hundreds of thousands of nodes) and then focus on a particular area of interest. For example, this image was produced by running Skyscope on its own build graph:

Motivation

The Bazel documentation gives a good overview of Skyframe that is worth reading if you’re interested, so we won’t go into the details here. Essentially though, Skyframe is the underlying model that Bazel uses to determine what actions it needs to perform when you run a command. The main Skyframe data structure is a dependency graph where the nodes are entities like ConfiguredTarget, FileState and ActionExecution.

Bazel provides high level ways to access this information (e.g. bazel query or bazel cquery) and mostly those methods are sufficient. However it can sometimes be helpful to get at the raw and unfiltered Skyframe graph, even if just to learn more about the internals of Bazel. This can be done with the bazel dump --skyframe command. The problem is this command produces a huge volume of fairly opaque textual data.

It would be great to feed this data into Graphviz so it could be examined visually (like you can do with bazel query --output graph), but with hundreds of thousands of nodes that’s a non-starter. Graphviz would likely run until the heat death of the universe but if it ever did finish, the layout would be too tangled to make any sense of.

The idea behind Skyscope is to mitigate these problems by focusing on a more manageable subgraph. This is done by rendering only a small subset of all the nodes and edges in the graph and hiding the rest. You can enter a search pattern to find nodes to add to this subset. You can also click on various parts of the graph to add and remove nodes interactively.

Trying Skyscope

There are a few different ways of getting Skyscope, but all require the graphviz, curl and jq packages to be installed on your system as a prerequisite. In this first example we will be looking at the Skylib repository. You can adapt the commands for your own Bazel repository, but if you want to follow along with the example you should begin by cloning and building Skylib:

git clone https://github.com/bazelbuild/bazel-skylib.git
cd bazel-skylib && bazel build //distribution:bazel-skylib

Then the easiest method¹ of getting Skyscope is to append this snippet to the WORKSPACE file:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "skyscope",
    sha256 = "5544313ec77adbc96856c4cdfb3dfc6b5409e05790860ae19c7d321fb585490b",
    urls = ["https://github.com/tweag/skyscope/releases/download/v0.2.7/skyscope.zip"]
)
load("@skyscope//:repository.bzl", "configure_skyscope")
configure_skyscope()

Having done that, you can run the following commands to import a snapshot of the build graph:

# Clear the graph by stopping the Bazel server.
bazel shutdown

# Populate the graph again.
bazel build //distribution:bazel-skylib

# Import a snapshot of the graph.
bazel run -- @skyscope//:import

The fewer dependencies the target you build has, the faster the import process will be. Once it is complete you will be prompted to open a link in your browser to view the graph. In the main search box, enter rule labels or filenames to find and display as nodes:

You can freely explore adjacent nodes by clicking them. Please consult the documentation for detailed usage instructions (if nothing else, it’s worth reading the section on exploring the graph to familiarise yourself with the basic interface).

We will now take a look at a couple of common use cases across a few different repositories. The procedure to install Skyscope in these is the same as above.

1. Find a dependency path from one target to another

Much like the somepath function, you can use Skyscope to find dependency paths between targets. The example below is for the TensorFlow repository and the import was prepared as follows:

# Clear the Skyframe graph.
bazel shutdown

# Populate the graph with ConfiguredTarget nodes for tf-reduce and its dependencies.
bazel cquery //tensorflow/compiler/mlir:tf-reduce

# Import a snapshot of the graph with extra context for targets under the //tensorflow package.
bazel run -- @skyscope//:import --query=//tensorflow/... --no-aquery

The import process will take a few minutes to index all the paths. Once it is complete, you can search for the tf-reduce and kernels.cc targets and make them visible:

Click on Open to make the nodes on the dependency path visible. If you hover over a node, extra context will be displayed in a tooltip:

2. Discover the actions needed to transform a source file into a binary

The previous example looked only at ConfiguredTarget nodes (populated by bazel cquery) so let’s now take a look at a graph that has some ActionExecution nodes. In this example we will import from the Bazelisk repository:

# Clear the Skyframe graph.
bazel shutdown

# Populate the graph with the actions needed to build bazelisk.
bazel build //:bazelisk

# Import a snapshot of the graph (with default extra context).
bazel run -- @skyscope//:import

Begin by searching for and displaying the TargetCompletion node for //:bazelisk. Then make the FileState and ConfiguredTarget nodes for bazelisk.go visible too:

If you open the dependency paths you can see the GoCompilePkg and GoLink actions that produce the bazelisk binary. Also note that these actions have a dependency on the go_binary rule that created them:

Future direction

The biggest limitation at the moment is probably the import speed. For small workspaces and rules with relatively few dependencies it only takes a couple of seconds, but larger workspaces can take several minutes to import and index. As yet no attention has been given to performance optimisation, so there are likely some easy gains to be had here.

A related issue is the whole graph is loaded into memory before being written to SQLite. That was a convenient way to prototype, but now it would be good to switch to a streaming based approach where the Skyframe data is parsed and written to the database in chunks. This would drastically reduce peak memory usage and make it possible to import from big repositories on systems with less RAM. For example on my laptop, trying to import Envoy causes the process to get OOM killed² at around the 10GB mark.

As for new features, the issue tracker has a few ideas but I’d like to see if this is something people find useful before deciding what to work on next, so please give it a try if you can. Feedback and suggestions are very much welcome!

While convenient, there are some drawbacks to this method. The Skyframe graph will contain nodes related to running Skyscope itself, and depending on your workspace configuration bazel run can cause parts of the graph to become invalidated. It should work for the examples here but if you encounter this issue, you can manually install a release instead.↩
Terminated by the Out-of-Memory manager for using too much memory.↩

Behind the scenes

Ben Radford

A programmer who is passionate about platform engineering and automation.

Tech Group

Scalable Builds

Correct, efficient, and reliable builds are critical for developers to work and collaborate effectively.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Type-safe data processing pipelines A journey through the auditing process of a smart contract →