Skyscope is a new tool from the Scalable Build Systems team at Tweag. You can use it to visualise and explore Bazel build graphs in your web browser. More specifically, it lets you import a snapshot of a Skyframe graph (which might contain hundreds of thousands of nodes) and then focus on a particular area of interest. For example, this image was produced by running Skyscope on its own build graph:
Motivation
The Bazel documentation gives a good overview of Skyframe that is worth reading if you’re interested, so we won’t go into the details here. Essentially though, Skyframe is the underlying model that Bazel uses to determine what actions it needs to perform when you run a command. The main Skyframe data structure is a dependency graph where the nodes are entities like ConfiguredTarget, FileState and ActionExecution.
Bazel provides high level ways to access this information (e.g. bazel query
or bazel cquery
) and mostly those methods are sufficient. However it can
sometimes be helpful to get at the raw and unfiltered Skyframe graph, even if
just to learn more about the internals of Bazel. This can be done with the
bazel dump --skyframe
command. The problem is this command produces a huge
volume of fairly opaque textual data.
It would be great to feed this data into Graphviz so it could be
examined visually (like you can do with bazel query --output graph
), but with
hundreds of thousands of nodes that’s a non-starter. Graphviz would likely run
until the heat death of the universe but if it ever did finish, the layout
would be too tangled to make any sense of.
The idea behind Skyscope is to mitigate these problems by focusing on a more manageable subgraph. This is done by rendering only a small subset of all the nodes and edges in the graph and hiding the rest. You can enter a search pattern to find nodes to add to this subset. You can also click on various parts of the graph to add and remove nodes interactively.
Trying Skyscope
There are a few different ways of getting Skyscope, but all
require the graphviz
, curl
and jq
packages to be installed on your system
as a prerequisite. In this first example we will be looking at the
Skylib repository. You can adapt the commands for your own Bazel
repository, but if you want to follow along with the example you should begin
by cloning and building Skylib:
git clone https://github.com/bazelbuild/bazel-skylib.git
cd bazel-skylib && bazel build //distribution:bazel-skylib
Then the easiest method1 of getting Skyscope is to append this snippet to
the WORKSPACE
file:
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "skyscope",
sha256 = "5544313ec77adbc96856c4cdfb3dfc6b5409e05790860ae19c7d321fb585490b",
urls = ["https://github.com/tweag/skyscope/releases/download/v0.2.7/skyscope.zip"]
)
load("@skyscope//:repository.bzl", "configure_skyscope")
configure_skyscope()
Having done that, you can run the following commands to import a snapshot of the build graph:
# Clear the graph by stopping the Bazel server.
bazel shutdown
# Populate the graph again.
bazel build //distribution:bazel-skylib
# Import a snapshot of the graph.
bazel run -- @skyscope//:import
The fewer dependencies the target you build has, the faster the import process will be. Once it is complete you will be prompted to open a link in your browser to view the graph. In the main search box, enter rule labels or filenames to find and display as nodes:
You can freely explore adjacent nodes by clicking them. Please consult the documentation for detailed usage instructions (if nothing else, it’s worth reading the section on exploring the graph to familiarise yourself with the basic interface).
We will now take a look at a couple of common use cases across a few different repositories. The procedure to install Skyscope in these is the same as above.
1. Find a dependency path from one target to another
Much like the somepath
function, you can use Skyscope to
find dependency paths between targets. The example below is for the TensorFlow
repository and the import was prepared as follows:
# Clear the Skyframe graph.
bazel shutdown
# Populate the graph with ConfiguredTarget nodes for tf-reduce and its dependencies.
bazel cquery //tensorflow/compiler/mlir:tf-reduce
# Import a snapshot of the graph with extra context for targets under the //tensorflow package.
bazel run -- @skyscope//:import --query=//tensorflow/... --no-aquery
The import process will take a few minutes to index all the paths. Once it is
complete, you can search for the tf-reduce
and kernels.cc
targets and make
them visible:
Click on Open to make the nodes on the dependency path visible. If you hover over a node, extra context will be displayed in a tooltip:
2. Discover the actions needed to transform a source file into a binary
The previous example looked only at ConfiguredTarget
nodes (populated by
bazel cquery
) so let’s now take a look at a graph that has some
ActionExecution
nodes. In this example we will import from the Bazelisk
repository:
# Clear the Skyframe graph.
bazel shutdown
# Populate the graph with the actions needed to build bazelisk.
bazel build //:bazelisk
# Import a snapshot of the graph (with default extra context).
bazel run -- @skyscope//:import
Begin by searching for and displaying the TargetCompletion
node for
//:bazelisk
. Then make the FileState
and ConfiguredTarget
nodes for
bazelisk.go
visible too:
If you open the dependency paths you can see the GoCompilePkg
and GoLink
actions that produce the bazelisk
binary. Also note that these actions have a
dependency on the go_binary
rule that created them:
Future direction
The biggest limitation at the moment is probably the import speed. For small workspaces and rules with relatively few dependencies it only takes a couple of seconds, but larger workspaces can take several minutes to import and index. As yet no attention has been given to performance optimisation, so there are likely some easy gains to be had here.
A related issue is the whole graph is loaded into memory before being written to SQLite. That was a convenient way to prototype, but now it would be good to switch to a streaming based approach where the Skyframe data is parsed and written to the database in chunks. This would drastically reduce peak memory usage and make it possible to import from big repositories on systems with less RAM. For example on my laptop, trying to import Envoy causes the process to get OOM killed2 at around the 10GB mark.
As for new features, the issue tracker has a few ideas but I’d like to see if this is something people find useful before deciding what to work on next, so please give it a try if you can. Feedback and suggestions are very much welcome!
- While convenient, there are some drawbacks to this method. The Skyframe graph
will contain nodes related to running Skyscope itself, and depending on your
workspace configuration
bazel run
can cause parts of the graph to become invalidated. It should work for the examples here but if you encounter this issue, you can manually install a release instead.↩ - Terminated by the Out-of-Memory manager for using too much memory.↩
About the author
A programmer who is passionate about platform engineering and automation.
If you enjoyed this article, you might be interested in joining the Tweag team.