Welcome to our series on Bazel remote caching: understanding how Bitrise enables you to fully harness the power of Bazel's build and test automation capabilities.
Whether you are new to Bazel or already a user of it and you want to learn more about how Bazel works under the hood, you are in the right place. You can use Bazel without knowing most of what we cover here, but we think it is pretty cool and worth doing a deep dive.
What to expect in this series:
- Bazel remote caching API technical deep dive
- Server-side implementation of the remote cache API at Bitrise
- Remote execution
In this first post, we introduce Bazel's remote caching, specifically the API specification and client-side implementation, to enhance build times through cache reuse across different environments.
What is bazel?
Bazel is an open-source build and test automation tool designed to support multiple languages and platforms, often used for large, multi-language projects with complex dependencies. Its remote caching capabilities allow it to store and retrieve build outputs from a remote server, enabling faster build times and incremental builds across different machines and environments.
If you already use Bazel, with Bitrise remote build cache for Bazel you can set up remote caching in just a few minutes without being familiar with its internal details, and you can do that regardless what CI provider you use. If you want to enjoy the additional performance and cost benefits of the colocated build machines and build cache, you can use it with Bitrise CI, but it is not a requirement. Bitrise remote build cache for Bazel will work perfectly fine with any other CI vendor or self hosted CI.
Let’s start the deep dive into the internal details of Bazel, starting with the API specification and client-side implementation.
Specification v2.3.0
Google released a Protobuf-based specification that outlines how caching clients and remote servers interact. Bazel is one such client, but the API could also work with other build tools. For instance, we're considering updating our Bitrise remote build cache for the Gradle plugin to align with this specification, given its similarities to our current approach. There are various server-side implementations, however, we'll focus on Bitrise's design and implementation in our next post.
The specification may initially appear daunting, so I've summarized the key concepts into simple diagrams below, concentrating solely on caching and omitting aspects related to remote execution.
The core entity is Action
, which is a Command
(with arguments, environment variables, etc) that will be executed on a given platform (eg. Linux, macOS, etc).
All entities but one are referenced by their digest (hash). For example, each Action
has a command_digest
field with the value of the hash of the related Command
entity.
The only exception is ActionResult
which represents the results (stdout
, stderr
, exit code, produced files and directories, etc) of an already executed Action
and is referenced by the digest of the Action
. These enable a client flow such as:
- Execute
Action
(let's say it has a digest of a1), and upload itsActionResult
under keya1
. TheActionResult
itself will probably reference other blobs produced as part of theCommand
execution (let's say they are uploaded under and referenced by digestsb1
,b2
, andb3
). - Whenever the same exact
Action
(the same exactCommand
on the same exact platform) is executed, the client can skip it by hashing theAction
and looking for the already existingActionResult
for digesta1
. Related blobs (b1
,b2
, andb3
) can also be read from the cache using their digest as a key.
To showcase what these entities might correspond to, let's take a look at the example above. An Action
was executed, running the dotnet build
Command
on the Windows platform. It resulted in three files (some.dll
, other.dll,
and my.exe
), printed the Build succeeded.
message to its stdout
and terminated with a 0
exit code.
The specification also documents the usual client flow. It starts with calling the GetCapabilities
endpoint to check what compression, digest algorithm, API version, etc, are supported by the server (and the client should conform to them or not use the remote cache API at all in case of incompatibility). It continues with building an Action
graph (what Action
s to execute in which order in case they depend on each other).
For each Action
, it checks whether there is already a cached result. It calls the GetActionResult
endpoint of the remote cache API, passing the digest of the given Action
(which is the key of the desired ActionResult
). Remote caches have finite space, therefore they usually evict old entries based on some logic (eg. LRU/LFU) to make room for new entries. The server should ensure that all blobs referenced by the ActionResult
(stdout
, stderr
, produced files) are also available in the remote cache (further increasing their lifetimes to avoid their eviction before reading them).
If an ActionResult
is found in the remote cache, it also downloads the related blobs by their digests using the Content Addressable Storage (CAS) with one of the related endpoints. As per the specification, the BatchReadBlobs
endpoint (supports reading multiple blobs at once) should be used for small blobs, and the ByteStream
API’s Read
endpoint (supports streaming read for a single blob) should be used for large blobs.
If no ActionResult
is found, the Action
is executed locally, and its results are uploaded to the remote cache as part of a three-step process.
- It checks what blobs are missing from the remote cache using the
FindMissingBlobs
API endpoint (as it might be the case that two different actions produced some overlapping blobs). - Missing blobs are uploaded using the CAS with either the
BatchUpdateBlobs
or theByteStream
API’sWrite
endpoint (similarly to the read path). - The last step is calling
UpdateActionResult
, referencing the correspondingAction
and related blobs that were already uploaded previously.
A client-side implementation: Bazel v7.0.2
Bazel is open source with a large and active community behind it. We looked into its codebase to better understand what to expect on the API side and prepare for deep-dive debugging if users report issues with their workloads.
I will walk you through it very briefly using diagrams using the notation above (trying to simplify many layers of abstraction in the Java code).
Everything starts with BAZEL_MODULES
which is a long list of loosely coupled modules (classes implementing extending BlazeModule
) that are loaded by the Bazel runtime. Each module is responsible for a specific feature of Bazel. We are interested in remote caching, which is implemented by the RemoteModule
. It calls the GetCapabilites
endpoint on the configured remote caching API, checking if it's compatible with the Bazel client, and configures it for usage throughout the rest of the execution.
Soon after bootstrapping modules, Bazel starts parallel building and executing of the action graph using a component called Skyframe. For each action, it checks whether it was already executed and cached remotely (in which case local execution can be skipped by reading the existing ActionResult
and related blobs) or if it should be executed locally and cached remotely.
The read path starts by calling the GetActionResult
endpoint on the remote cache API.
- Suppose a result isn’t present for the given
Action
in the cache. In that case, it signals it to the caller, returning an emptyCacheHandle
that can be used to upload theActionResult
later on. - If it is successful, Bazel downloads all of the blobs referenced by the
ActionResult
(all of which should also be present in the cache according to the spec) using theRead
endpoint of theByteStream
API (never usingBatchReadBlobs
):stdout
andstderr
, all corresponding files, etc.
Writing the results of locally executed actions starts by calling the FindMissingDigests
endpoint of the API. For all missing blobs, the Write
endpoint of the ByteStream
API (never using BatchWriteBlobs
) is used either by streaming a file from the filesystem or passing it from memory (depending on the blob type). Interrupted writes are resumed using the QueryWriteStatus
endpoint of the API, which should return the number of bytes already successfully written to the cache, enabling Bazel to continue writing the missing parts only.
Once all referenced blobs are written, Bazel persists the result for the given Action
by calling the UpdateActionResult
endpoint.
Conclusion
At Bitrise, we continuously enhance our build caching capabilities to provide the best support possible. This article explored the Bazel remote caching API specification and its client-side implementation. Next, we will delve into the server-side implementation of the remote cache API at Bitrise.
Interested in optimizing your Bazel builds? For a comprehensive guide, visit our DevCenter.
Not using Bitrise Build Cache yet? Start streamlining your builds with Bazel or Gradle by signing up for a 30-day free trial at Bitrise—no strings attached. Alternatively, feel free to talk to a mobile expert.
Join our team! Explore career opportunities on our careers page.