We have shown the benefits of using a shared build cache as well as using remote build execution (RBE) to offload builds to a remote build farm. Our customers are interested in leveraging RBE to improve developer experience and reduce continuous integration (CI) run times, giving us an opportunity to learn all aspects of deploying different RBE solutions. I would like to share how one can deploy one of them, Buildbarn, and secure all communications in it.
What is it and why do we care?
We want developers to be productive. Being productive requires spending as little time as possible waiting for build/test feedback, not having to switch to a different task while the build is running.
Remote caching
One part of achieving this is to never build the same thing twice. Tools like Bazel support caching the result of every action, every tool execution. While many tools support storing results in a local directory, Bazel tracks the actions and their inputs with high granularity, resulting in more frequent “cache hits”. This is already a good gain for a single developer working on one machine. However Bazel also supports conducting builds in a controlled environment with identical tooling and using a remote cache that can be shared between team members and CI, taking things a significant step further. You won’t have to rebuild anything that has been built by your colleagues or by CI, which means starting up on a new machine, onboarding a new team member or reproducing issues becomes faster.
Remote build execution
The second part of keeping developers productive is allowing them to use the right tools for the job. They still often need to build new things, and their local machine may be not be the fastest, not have enough charge or have the wrong architecture or OS. Remote build execution extends remote caching by executing actions on shared builders when their results are not cached already. This allows setting up a shared pool of necessary hardware or virtual compute for both developers and CI. In Bazel this was implemented using RBE API.
RBE implementations
Since the last post, RBE for Google Cloud Platform (GCP) has disappeared, and several new self-service and commercial services have been created. The RBE API has also gained popularity with different build systems, including Bazel (where it started), Buck2, and BuildStream. It is also used in projects that cannot change their build systems easily, but can use reclient to wrap all build actions and forward them to an RBE service. Examples of such setup include Android, Fuchsia and Chromium.
We’ll focus on one of opensource RBE API servers, Buildbarn.
Securing remote cache and builds
Any shared infrastructure implies some security risks. When sending code to be built remotely we expose it on the network, where it can be intercepted or altered. When reading from the cache, we trust it to contain valid, unaltered results. When setting up a pool of compute resources, we expect them to be used only for building our code, and not for enriching third parties. All these expectations mean that we require all communications with remote infrastructure and within it to be encrypted and authenticated. The industry standard for achieving this is mTLS: Transport Layer Security (TLS) protocol with mutual authentication. It uses public key infrastructure (PKI) to allow both clients and servers to verify each other’s identities before sending any data, and makes sure that the data sent on one side matches the data received on the other side.
Overview
In this extended blog post we’ll start by showing how to deploy Buildbarn on a Kubernetes cluster running in a local VM and configure a simple Bazel example to use it. Then we’ll turn on mTLS with the help of cert-manager for all Buildbarn pieces communicating with one another, and, finally, configure Bazel on a developer or CI machine to authenticate over the RBE API with a certificate and verify the one presented by the build server.
This blog post contains a lot of code snippets that let you follow the
installation process step by step. If you copy each command into your terminal
in order, you should see the same results as described. If you prefer to jump
to the final result and look at the complete picture, you can check out our
fork of the upstream
buildbarn/bb-deployments
repository and follow the
instructions there.
Deploying Buildbarn
In this section we’ll create a local Buildbarn deployment on a Kubernetes cluster running in a VM. We’ll create a local VM with Kubernetes using an example config provided by lima. Then we’ll configure persistent volumes for Buildbarn storage inside that VM. After that we’ll use the Kubernetes example from a repository provided by Buildbarn to deploy Buildbarn itself.
Setting up a Kubernetes instance
If you already have access to a Kubernetes cluster that you can use, you can skip this section. Here we’ll deploy a local VM with Kubernetes running in it. In subsequent steps below it’s assumed that you’re using a local VM, so you’ll have to adjust some parameters accordingly if you use different means.
I’ve found that the easiest and most portable way to get a Kubernetes running
locally is using the lima (Linux Machines) project. You can follow the
official docs to install it. I prefer using Nix and
direnv, so I’ve created a .envrc
file with one line use nix
and
shell.nix
with the following contents:
{ nixpkgs ? builtins.getFlake "nixpkgs"
, system ? builtins.currentSystem
, pkgs ? nixpkgs.legacyPackages.${system}
}:
pkgs.mkShell {
packages = with pkgs; [
kubectl
lima-bin
jq
];
}
Then you just need to run direnv allow
and it will fetch the necessary
packages and make them available in your shell.
Now we can create a Lima VM from the k8s
template. We remove mounts
from
the template to specify our own later. We also need to add some special options
for running on macOS:
limactl create template://k8s --name k8s --tty=false \
--set '.provision |= . + {"mode":"system","script":"#!/bin/bash
for d in /mnt/fast-disks/vol{0,1,2,3}; do sudo mkdir -p $d; sudo mount --bind $d $d; done"}' \
$([ "$(uname -s)" = "Darwin" ] && { echo "--vm-type vz"; [ "$(uname -m)" = "arm64" ] && echo "--rosetta"; })
Here arguments are:
--name k8s
sets a name for the new VM; it defaults to the template name, but let’s keep it explicit--set '.provision ...'
uses a jq expression to add an additional provision step to the resulting YAML file creating necessary mountpoints for persistent volumes--tty=false
disables console prompts and confirmations- for macOS we also add
--vm-type vz
to use the native macOS Virtualization framework instead of QEMU for a faster VM - for Apple Silicon we also add
--rosetta
to enable the translation layer, allowing us to run x86_64 containers in the VM with little overhead
You can start the final VM and check if it is ready with:
limactl start k8s
export KUBECONFIG=~/.lima/k8s/copied-from-guest/kubeconfig.yaml
kubectl get node
It will take some time to bootstrap Kubernetes, after which it should show you
one node called lima-k8s
with Ready
status:
NAME STATUS ROLES AGE VERSION
lima-k8s Ready control-plane 4m54s v1.29.2
Buildbarn will need some PersistentVolumes to store data. Let’s teach it to use the mounts that we created earlier for that. First, configure a storage class:
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-disks
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF
It should respond with storageclass.storage.k8s.io/fast-disks created
.
Then start a local volume provisioner from sig-storage-local-static-provisioner:
curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-storage-local-static-provisioner/master/deployment/kubernetes/example/default_example_provisioner_generated.yaml | kubectl apply -f -
Run kubectl get pv
to see that it created four volumes. They may take several
seconds to appear. You can check the provisioner’s logs for any errors with
kubectl logs daemonset/local-volume-provisioner
.
Deploying Buildbarn
bb-deployments provides a Kustomize template to deploy Buildbarn. Let’s clone it, patch one service so that we can run it locally, and deploy:
git clone https://github.com/buildbarn/bb-deployments.git
pushd bb-deployments/kubernetes
cat >> kustomization.yaml <<EOF
# patch frontend service to not require external load balancers
patches:
- target:
kind: Service
name: frontend
patch: |
- op: replace
path: /spec/type
value: NodePort
- op: add
path: /spec/ports/0/nodePort
value: 30080
EOF
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
The last command will wait for everything to start. We’ve filtered out all messages about resources that it doesn’t know how to wait for.
To check that the Buildbarn frontend is accessible, we can use
grpc-client-cli
. Add it to the list in shell.nix
, save it and run:
grpc-client-cli -a 127.0.0.1:30080 health
It should report that it is SERVING
:
{
"status": "SERVING"
}
We can exit the bb-deployments
directory now:
popd
In this section we’ve deployed Buildbarn and verified that its API is accessible. Now we’ll move on to setting up a small Bazel project to use it. Then we’ll configure mTLS on Buildbarn, and finally configure Bazel to work with mTLS.
Using Buildbarn
Let’s set up a small Bazel project to use our Buildbarn instance. In this section we’ll use Bazel examples repo and show how to build it using Bazel locally and with RBE. We’ll also see how remote caching speeds up builds by caching intermediate results.
We will be using Bazelisk to fetch and run upstream distribution of
Bazel. First we’ll need to install Bazelisk by adding bazelisk
to shell.nix
.
If you are running NixOS, you will have to create an FHS
environment to run Bazel. If you are running macOS and don’t
have Xcode command line tools installed, you also need to provide necessary
libraries to bazel invocation. Add this to your shell.nix
:
pkgs.mkShell {
packages = with pkgs; [
...
bazelisk
];
env = pkgs.lib.optionalAttrs pkgs.stdenv.isDarwin {
BAZEL_LINKOPTS = with pkgs.darwin.apple_sdk;
"-F${frameworks.Foundation}/Library/Frameworks:-L${objc4}/lib";
BAZEL_CXXOPTS = "-I${pkgs.libcxx.dev}/include/c++/v1";
};
# fhs is only used on NixOS
passthru.fhs = (pkgs.buildFHSUserEnv {
name = "bazel-userenv";
runScript = "zsh"; # replace with your shell of choice
targetPkgs = pkgs: with pkgs; [
libz # required for bazelisk to unpack Bazel itself
];
}).env;
}
Then on NixOS you can run nix-shell -A fhs
to enter an environment where
directories like /bin
, /usr
and /lib
are set up as tools made for other
Linux distributions expect.
Now we can clone Bazel examples repo and enter the simple C++ example in it:
git clone --depth 1 https://github.com/bazelbuild/examples
pushd examples/cpp-tutorial/stage1
On macOS we’ll need to configure compiler and linker flags to look for libraries in Nix store:
echo "build:macos --action_env=BAZEL_CXXOPTS=${BAZEL_CXXOPTS}" >> .bazelrc
echo "build:macos --action_env=BAZEL_LINKOPTS=${BAZEL_LINKOPTS}" >> .bazelrc
We will be building remotely for the Linux platform later, so we should specify a concrete platform and toolchain to use for Linux:
echo "build:linux --platforms=@aspect_gcc_toolchain//platforms:x86_64_linux" >> .bazelrc
echo "build:linux --extra_execution_platforms=@aspect_gcc_toolchain//platforms:x86_64_linux" >> .bazelrc
And then build and run the example locally:
bazelisk run //main:hello-world
You should see output like:
Starting local Bazel server and connecting to it...
INFO: Analyzed target //main:hello-world (38 packages loaded, 165 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
bazel-bin/main/hello-world
INFO: Elapsed time: 7.545s, Critical Path: 0.94s
INFO: 8 processes: 6 internal, 2 processwrapper-sandbox.
INFO: Build completed successfully, 8 total actions
INFO: Running command line: bazel-bin/main/hello-world
Hello world
Note that if we run bazelisk run //main:hello-world
again, it’ll be much
faster, because Bazel only spends a fraction of a second on computing the
action graph and making sure that nothing needs to be rebuilt:
...
INFO: Elapsed time: 0.113s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
...
We can also run bazelisk clean
to remove previous output and re-run it to
make sure we can rebuild from scratch.
Now let’s try building it using Buildbarn. First we need to configure execution properties to match ones set up in Buildbarn’s worker config:
echo "build:remote --remote_default_exec_properties OSFamily=linux" >> .bazelrc
echo "build:remote --remote_default_exec_properties container-image=docker://ghcr.io/catthehacker/ubuntu:act-22.04@sha256:5f9c35c25db1d51a8ddaae5c0ba8d3c163c5e9a4a6cc97acd409ac7eae239448" >> .bazelrc
Then we should tell Bazel to use Buildbarn as a remote executor:
echo "build:remote --remote_executor grpc://127.0.0.1:30080" >> .bazelrc
Now we can build it with
bazelisk build --config=linux --config=remote //main:hello-world
. Note that
it will take some time to extract the Linux compiler and supplemental files
first:
INFO: Invocation ID: d70b9d30-1865-4d1f-8d52-77c6fc5ec607
INFO: Build options --extra_execution_platforms, --incompatible_enable_cc_toolchain_resolution, and --platforms have changed, discarding analysis cache.
INFO: Analyzed target //main:hello-world (3 packages loaded, 6315 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
bazel-bin/main/hello-world
INFO: Elapsed time: 96.249s, Critical Path: 52.72s
INFO: 5 processes: 3 internal, 2 remote.
INFO: Build completed successfully, 5 total actions
As you can see, two actions were executed remotely: compilation and linking. But
we can find the result locally in bazel-bin/main/hello-world
(and run it if
we’re on an appropriate platform):
% file bazel-bin/main/hello-world
bazel-bin/main/hello-world: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 4.9.0, not stripped
Now if we clean local caches and rebuild, we can see that it reuses results already stored in Buildbarn (remote cache hits):
% bazelisk clean
INFO: Invocation ID: d655d3f2-071d-48ff-b3e9-e0b1c61ae5fb
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
% bazelisk build --config=linux --config=remote //main:hello-world
INFO: Invocation ID: d38526d8-0242-4b91-92da-20ddd110d3ae
INFO: Analyzed target //main:hello-world (41 packages loaded, 6315 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
bazel-bin/main/hello-world
INFO: Elapsed time: 0.663s, Critical Path: 0.07s
INFO: 5 processes: 2 remote cache hit, 3 internal.
INFO: Build completed successfully, 5 total actions
We can exit the examples
directory now:
popd
In this section we’ve configured a Bazel project to be built using our Buildbarn instance. Now we’ll configure mTLS on Buildbarn and then finally reconfigure this Bazel project to access Buildbarn using mTLS.
Configuring TLS in Buildbarn
We want each component of Buildbarn to have its own automatically generated certificate and use it to connect to other components. On the other side, each component that accepts connections should verify that the incoming connection is accompanied by a valid certificate as well. In this section we’ll use cert-manager to generate certificates and a more secure CSI driver to request certificates and propagate them to Buildbarn components. Then we’ll configure Buildbarn components to verify both sides of each connection. Here’s how this process should look like for frontend and storage containers, for example:
Node 1 │ Kubernetes API │ Node 2
│ │
┌─────────────────────────┐ │ │ ┌─────────────────────────┐
│ Frontend pod │ │ mTLS │ │ Storage pod │
│ bb-storage process │<───────────────────────────────────────>│ bb-storage process │
├─────────────────────────┤ │ ┌──────────────┐ │ ├─────────────────────────┤
│ CSI volume ca.crt │ │ │ cert-manager │ │ │ ca.crt CSI volume │
│ tls.key tls.crt │ │ └─────┬────────┘ │ │ tls.crt tls.key │
└──────────^─────────^────┘ │ │ fills out │ └───^─────────^───────────┘
│ │ │ V │ │ │
generates stores │ apiVersion: cert-manager.io/v1 │ stores generates
│ │ kind: CertificateRequest │ │
┌┴─────────┴─┐ creates spec: ┌┴─────────┴─┐
│ CSI driver │────────> request: LS0tLS... │ CSI driver │
└────────────┘ status: └────────────┘
^ retrieves certificate: ...
└─────────── ca: ...
- CSI driver sees CSI volume, generates a key in
tls.key
in there. - CSI driver uses key from
tls.key
to generate a Certificate Signing Request (CSR) and creates CertificateRequest resource in Kubernetes API with it. - cert-manager signs the CertificateRequest with CA certificate and puts both resulting certificate and CA certificate in the CertificateRequest’s status.
- CSI driver stores them in
tls.crt
andca.crt
respectively in CSI volume. - bb-storage process in the frontend pod uses certificate and key from
tls.crt
andtls.key
to establish TLS connection to the storage pod, verifying that the later presents a valid certificate signed by a CA certificate fromca.crt
. - On the storage side
tls.key
,tls.crt
andca.crt
are filled out in the similar manner - bb-storage process in the storage pod verifies the incoming certificate with
CA certificate from
ca.crt
and presents certificate fromtls.crt
to the frontend.
Notice how with this approach secret keys never leave the node where they are generated and used, and the connection between frontend and storage pods is authenticated on both ends.
Installing cert-manager
To generate certificates for our Buildbarn we need to install and configure cert-manager itself and its CSI driver. cert-manager is responsible for generating and updating certificates requested via Kubernetes API objects. The CSI driver lets users create special volumes in pods where private keys are generated locally and certificates are requested from cert-manager and provided to the pod.
First, let’s fetch all necessary manifests and add them to our deployment. The cert-manager project publishes a ready-to-use Kubernetes manifest, so we can manually fetch it:
pushd bb-deployments/kubernetes
curl -LO https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.yaml
And then add it to the resources
section of our kustomization.yaml
:
resources:
- ...
- cert-manager.yaml
Unfortunately, the cert-manager CSI driver doesn’t directly provide a k8s
manifest, but rather a Helm chart. Add kubernetes-helm
to your shell.nix
and then run:
helm template -n cert-manager -a storage.k8s.io/v1/CSIDriver https://charts.jetstack.io/charts/cert-manager-csi-driver-v0.7.1.tgz > cert-manager-csi-driver.yaml
-a storage.k8s.io/v1/CSIDriver
makes sure that chart uses the latest version
of the Kubernetes API to register itself.
Then we can add it to resources
section of our kustomization.yaml
:
resources:
- ...
- cert-manager.yaml
- cert-manager-csi-driver.yaml
Let’s deploy and wait for everything to start. We will use cmctl
to check
that cert-manager is working correctly, so you’ll need to add it to shell.nix
.
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
cmctl check api --wait 10m
kubectl get csinode -o yaml
cmctl
should report The cert-manager API is ready
, and the last command
should output your only node with one driver called csi.cert-manager.io
installed:
namespace/buildbarn unchanged
namespace/cert-manager created
...
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
...
The cert-manager API is ready
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
...
name: lima-k8s
...
spec:
drivers:
- name: csi.cert-manager.io
nodeID: lima-k8s
topologyKeys: null
kind: List
metadata:
resourceVersion: ""
If it says drivers: null
, re-run kubectl get csinode -o yaml
a bit later to
allow more time for driver deployment and startup.
Creating CA certificate
First we need to create a CA certificate and an Issuer that cert-manager will
use to generate certificates for our needs. Note that to generate a self-signed
certificate we’ll also need to create another issuer. Put this in ca.yaml
:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned
namespace: buildbarn
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: ca
namespace: buildbarn
spec:
isCA: true
commonName: ca
secretName: ca
privateKey:
algorithm: ECDSA
size: 256
issuerRef:
name: selfsigned
kind: Issuer
group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: ca
namespace: buildbarn
spec:
ca:
secretName: ca
Then add it to resources
section of our kustomization.yaml
:
resources:
- ...
- ca.yaml
And apply it and check their status:
kubectl apply -k .
kubectl -n buildbarn get issuers -o wide
Both issuers should be there, and ca
issuer should have the Signing CA verified
status:
NAME READY STATUS AGE
ca True Signing CA verified 14s
selfsigned True 14s
If it says something like secrets "ca" not found
, it means it needs some time
to generate the certificate. Re-run kubectl -n buildbarn get issuers -o wide
.
Generating certificates for Buildbarn components
As mentioned before, we will be generating certificates for each component
using cert-manager’s CSI driver. To do this, we need to add a volume to each
pod and mount it into the main container so that the service can read it. We
also need to pass CA certificate into all these containers to verify other side
of each connection. Unfortunately, Buildbarn doesn’t support reading these
from file, so we’ll have to pass it statically via config.
Let’s prepare this config file using this command that reads the CA certificate
via the Kubernetes API and formats it using jq
into a JSON string:
kubectl -n buildbarn get certificaterequests ca-1 -o jsonpath='{.status.ca}' | base64 -d | jq --raw-input --slurp . > config/ca-cert.jsonnet
Now we can configure all pods by adding the following patches in
kustomization.yaml
:
patches:
- ...
- target:
kind: Deployment
namespace: buildbarn
patch: |
- op: add
path: /spec/template/spec/volumes/-
value:
name: tls-cert
csi:
driver: csi.cert-manager.io
readOnly: true
volumeAttributes:
csi.cert-manager.io/issuer-name: ca
- op: add
path: /spec/template/spec/containers/0/volumeMounts/-
value:
mountPath: /cert
name: tls-cert
readOnly: true
- target:
kind: Deployment
namespace: buildbarn
name: frontend
patch: |
- op: add
path: /spec/template/spec/volumes/0/configMap/items/-
value:
key: ca-cert.jsonnet
path: ca-cert.jsonnet
- op: add
path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names
value: frontend,frontend.${POD_NAMESPACE},frontend.${POD_NAMESPACE}.svc.cluster.local
- op: add
path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1ip-sans
value: 127.0.0.1
- target:
kind: Deployment
namespace: buildbarn
name: browser
patch: |
- op: add
path: /spec/template/spec/volumes/0/configMap/items/-
value:
key: ca-cert.jsonnet
path: ca-cert.jsonnet
- op: add
path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names
value: browser,browser.${POD_NAMESPACE},browser.${POD_NAMESPACE}.svc.cluster.local
- target:
kind: Deployment
namespace: buildbarn
name: scheduler-ubuntu22-04
patch: |
- op: add
path: /spec/template/spec/volumes/0/configMap/items/-
value:
key: ca-cert.jsonnet
path: ca-cert.jsonnet
- op: add
path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names
value: scheduler,scheduler.${POD_NAMESPACE}
- target:
kind: Deployment
namespace: buildbarn
name: worker-ubuntu22-04
patch: |
- op: add
path: /spec/template/spec/volumes/1/configMap/items/-
value:
key: ca-cert.jsonnet
path: ca-cert.jsonnet
- op: add
path: /spec/template/spec/volumes/3/csi/volumeAttributes/csi.cert-manager.io~1dns-names
value: worker,worker.${POD_NAMESPACE}
- target:
kind: StatefulSet
namespace: buildbarn
name: storage
patch: |
- op: add
path: /spec/template/spec/volumes/0/configMap/items/-
value:
key: ca-cert.jsonnet
path: ca-cert.jsonnet
- op: add
path: /spec/template/spec/volumes/-
value:
name: tls-cert
csi:
driver: csi.cert-manager.io
readOnly: true
volumeAttributes:
csi.cert-manager.io/issuer-name: ca
csi.cert-manager.io/dns-names: ${POD_NAME}.storage,${POD_NAME}.storage.${POD_NAMESPACE}
- op: add
path: /spec/template/spec/containers/0/volumeMounts/-
value:
mountPath: /cert
name: tls-cert
readOnly: true
To avoid repetition, the first patch is applied to all Deployment objects, and
consecutive patches only add the proper list of DNS names for each certificate.
Note that many of those DNS names will not be used as only some of these
services actually accept connections. For the frontend
Deployment we also add
127.0.0.1
IP so that it can be accessed via a port forwarded to localhost as
we currently use it on the host machine. For the storage
StatefulSet we
configure unique DNS name for each Pod because they are contacted directly and
not through a common service. For each of these we also add ca-cert.jsonnet
to the list of files used from the configuration ConfigMap. We also need to add
it to the ConfigMap itself by adding it to the list in
config/kustomization.yaml
:
configMapGenerator:
- name: buildbarn-config
namespace: buildbarn
files:
- ...
- ca-cert.jsonnet
We can apply all these changes with:
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
Now you can fetch the list of CertificateRequest objects to see their statuses:
kubectl -n buildbarn get certificaterequest
It will output one request for the ca
certificate named ca-1
and a bunch of
requests generated for each pod:
NAME APPROVED DENIED READY ISSUER REQUESTOR AGE
14468f64-909f-43d1-b67d-07b0844c0683 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m
1d9e41a6-e58f-4c13-b9e6-0b1ba1d5a4f6 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
2c2f1177-81fc-45e5-8487-9b66bc0d6f73 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
31fdb0ef-0c0b-4a06-94af-fb17875ee05d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
376d0933-c0e9-4d39-b5c6-b76071c65966 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m58s
3967cdd6-7d48-4814-8cec-542041182dd0 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
464a1f35-f0ba-4236-aeec-294f880d9675 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s
5181e602-276e-413e-8888-76c4bd1ede21 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s
6f02092d-b8a3-4eb7-8ff2-5e4a433d59bb True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
710a458e-6ba0-4a44-87ab-5115b5a2c213 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m58s
753c4653-71ae-447e-bbe5-022ce35cee9d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
8bcbb5a0-4575-40ad-b842-9c86bde8fdb8 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m56s
8df59bf5-ed23-47af-bfcc-3cf8a9053b9b True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
b47fff23-40b4-43ed-8e34-35d988eb434d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m56s
be72bdc6-c61d-4f1b-928e-f743df0f6188 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s
c14a52d5-dc20-4626-afe6-975442103d8b True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m
ca-1 True True selfsigned system:serviceaccount:cert-manager:cert-manager 3d22h
ceabf1ab-06a7-47c0-855a-2009bbbd2418 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m
Using certificates
Now that we’ve generated all necessary certificates and made them available to
all pods, we can configure all components to use them. We’ll use similar
stanzas for each service, so let’s first add some helper functions to the top
of config/common.libsonnet
:
local localKeyPair = {
files: {
certificate_path: '/cert/tls.crt',
private_key_path: '/cert/tls.key',
refresh_interval: '3600s',
},
};
local grpcClientWithTLS = function(address) {
address: address,
tls: {
server_certificate_authorities: import 'ca-cert.jsonnet',
client_key_pair: localKeyPair,
},
};
local oneListenAddressWithTLS = function(address) [{
listenAddresses: [address],
authenticationPolicy: {
tls_client_certificate: {
client_certificate_authorities: import 'ca-cert.jsonnet',
validation_jmespath_expression: '`true`',
metadata_extraction_jmespath_expression: '`{}`',
},
},
tls: {
server_key_pair: localKeyPair,
},
}];
And then expose these functions to use in other configs at the end of the file:
...
grpcClientWithTLS: grpcClientWithTLS,
oneListenAddressWithTLS: oneListenAddressWithTLS,
}
Note that local certificate and key files will be reloaded every hour per the
refresh_interval
setting, but the CA certificate will need to be reconfigured
manually every time it refreshes.
Also note that we accept all valid certificates by setting
validation_jmespath_expression
to `true`
. This expression can be
configured later for each service if needed.
Now we’re ready to configure the Buildbarn services.
Storage
Let’s start with storage. The client side configuration is the same for all
services that connect to it and is stored in config/common.libsonnet
. Replace
lines like this one:
backend: { grpc: { address: 'storage-0.storage.buildbarn:8981' } },
with usage of our new function:
backend: { grpc: grpcClientWithTLS('storage-0.storage.buildbarn:8981') },
Keep the address the same (storage-0
and storage-1
should remain in place).
Now in config/storage.jsonnet
replace these GRPC server configuration lines:
grpcServers: [{
listenAddresses: [':8981'],
authenticationPolicy: { allow: {} },
}],
With a call to another function:
grpcServers: common.oneListenAddressWithTLS(':8981'),
Make sure that the address itself is the same again.
Now let’s apply it and wait for all pods to restart:
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
Let’s check that the storage service is still accessible via the frontend service by rebuilding our example project:
pushd ../../examples/cpp-tutorial/stage1
bazelisk clean
bazelisk build --config=linux --config=remote //main:hello-world
popd
It should show that it fetched output from the remote cache:
...
INFO: 5 processes: 2 remote cache hit, 3 internal.
...
Scheduler
The scheduler exposes at least four GRPC endpoints, but we’ll cover only the
client (frontend) and worker sides as we don’t use other endpoints yet. Just
like with storage, you should replace clientGrpcServers
and
workerGrpcServers
settings with calls to oneListenAddressWithTLS
in
config/scheduler.jsonnet
, passing the addresses themselves as an argument:
...
clientGrpcServers: common.oneListenAddressWithTLS(':8982'),
workerGrpcServers: common.oneListenAddressWithTLS(':8983'),
...
The scheduler itself only connects to storage, and that part has already been
configured in config/common.jsonnet
.
Workers
Workers only connect to the scheduler and storage. With the latter being
already configured, we need to only change scheduler
setting in
config/worker-ubuntu22-04.jsonnet
:
...
scheduler: common.grpcClientWithTLS('scheduler:8983'),
...
Frontend
The frontend listens for incoming connections from clients and fans them out, either
to storage or to the scheduler. Storage access has already been covered, so we
only need to replace grpcServers
and schedulers
settings in
config/frontend.jsonnet
:
grpcServers: common.oneListenAddressWithTLS(':8980'),
schedulers: {
'': {
endpoint: common.grpcClientWithTLS('scheduler:8982') {
addMetadataJmespathExpression: |||
{
"build.bazel.remote.execution.v2.requestmetadata-bin": incomingGRPCMetadata."build.bazel.remote.execution.v2.requestmetadata-bin"
}
|||,
},
},
},
Note that we preserve all addresses and keep the additional
addMetadataJmespathExpression
field that augments requests to the scheduler.
Applying it all
Now we can apply all these settings with:
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
All deployments should eventually roll out and work. This means that all internal communications between Buildbarn components are encrypted and authenticated.
In this section we’ve achieved our goal of securing Buildbarn deployment using mTLS. Now all that’s left is to reconfigure Bazel to use and verify certificates while accessing Buildbarn’s RBE API endpoint.
Configuring certificates on client
So far we’ve configured Buildbarn to always use TLS encrypted connections. It
means that our current client setup for using it will not work because it
doesn’t expect TLS. In this section we’ll generate a client certificate for it
using the cmctl
tool, configure Bazel to both validate the server certificate
and use this new client certificate when communicating with Buildbarn, and show
the final complete example.
First, note that as said, if we run Bazel with current client configuration it will fail due to using a non-encrypted connection to an encrypted endpoint:
pushd ../../examples/cpp-tutorial/stage1
bazelisk clean
bazelisk build --config=linux --config=remote //main:hello-world
The error will look like this:
INFO: Invocation ID: dc8188ca-e77f-4884-a596-612779c6ae33
ERROR: Failed to query remote execution capabilities: UNAVAILABLE: Network closed for unknown reason
To configure the client to use an encrypted connection, we need to replace
the grpc
protocol with grpcs
in .bazelrc
and try again:
sed -i s/grpc/grpcs/ .bazelrc
bazelisk build --config=linux --config=remote //main:hello-world
Now the error will indicate that something else is missing - in this case, a client certificate:
INFO: Invocation ID: 7dcb900f-17eb-4dbb-ab9c-df9c70bc2c92
ERROR: Failed to query remote execution capabilities: UNAVAILABLE: io exception
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
To address that, we need to generate client certificates and configure Bazel to use them.
Generating the client certificate
We will use cert-manager and its CLI client cmctl
to generate a certificate
for our client. First, we need to create a Certificate object template in
cert-template.yaml
:
cat > cert-template.yaml <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
spec:
commonName: client
usages:
- client auth
privateKey:
algorithm: ECDSA
size: 256
issuerRef:
name: ca
kind: Issuer
group: cert-manager.io
EOF
Then we can use it to create the actual certificate:
cmctl create certificaterequest -n buildbarn client --from-certificate-file cert-template.yaml --fetch-certificate
It will use this certificate template as if it was created in Kubernetes: it
will generate a key in client.key
, create a Certificate Signing Request (CSR)
from it, embed that in a cert-manager CertificateRequest and send it, wait for
the server to sign it, and finally retrieve the resulting certificate to
client.crt
.
We also need a CA certificate to verify server certificates. We can use the same command we used for Buildbarn configuration here:
kubectl -n buildbarn get certificaterequests ca-1 -o jsonpath='{.status.ca}' | base64 -d > ca.crt
You can make sure that client certificate is signed with this CA certificate by
adding openssl
to shell.nix
and running:
openssl verify -CAfile ca.crt client.crt
It will output client.crt: OK
if everything is correct.
Building with certificates
All that’s left is to tell Bazel to use these certificates to connect to
Buildbarn. We’ll need to convert the private key to PKCS#8 format for it and
add these settings to .bazelrc
:
openssl pkcs8 -topk8 -nocrypt -in client.key -out client.pem
echo "build:remote --tls_certificate=ca.crt" >> .bazelrc
echo "build:remote --tls_client_certificate=client.crt" >> .bazelrc
echo "build:remote --tls_client_key=client.pem" >> .bazelrc
Now let’s clean the Bazel cache and run the build:
bazelisk clean
bazelisk build --config=linux --config=remote //main:hello-world
You will see that the remote cache is in use, which means that TLS has been configured successfully:
...
INFO: Elapsed time: 0.601s, Critical Path: 0.10s
INFO: 5 processes: 2 remote cache hit, 3 internal.
...
To make sure that the actual build also works, we can change the source file a bit and re-run the build:
echo >> main/hello-world.cc
bazelisk build --config=linux --config=remote //main:hello-world
It will now take some time and actually show that it has built one action remotely:
...
INFO: Elapsed time: 15.866s, Critical Path: 15.69s
INFO: 2 processes: 1 internal, 1 remote.
...
Conclusion
We’ve shown how to deploy Buildbarn on Kubernetes, how to configure mTLS between all its components, and how to use TLS authentication with RBE API clients using Bazel as an example. This is a starting configuration that can be improved in several aspects not covered here:
- The Buildbarn browser and the scheduler web UIs are neither exposed nor encrypted;
- cert-manager is not configured to limit access to certificate generation, meaning that anyone with access to Kubernetes API has access to all its capabilities;
- no limits are imposed on client certificates, they only need to be valid;
- there is no automation for client certificate renewal;
- and only certificates are used for authentication, which is secure but can be enhanced or replaced with OAuth which is more flexible and provides better control
All these are interesting topics that would each deserve their own blog post.
About the author
If you enjoyed this article, you might be interested in joining the Tweag team.