This article was originally published on the Nextdoor Engineering Blog.
Two of the most useful tools for making software development run smoothly are CI & CD. Over the years our stack has evolved from a humble Jenkins box into a cloud-native platform. In this post, we’ll share our learnings from this journey.
A little history
We started off our iOS CI/CD using on-premise infrastructure, running Macs with Jenkins. The build performance was great because all the builds were running on barebone machines, and we had full control over these build instances. However, this implementation had its limitations:
- Maintenance: Keeping everything up to date from Jenkins to the machine itself was a huge overhead for us. Engineers spent hours fixing issues like network and hardware failures.
- Unstable builds: Due to the lack of isolation between builds, the cache from an old build frequently made a new build fail. Small, unpredictable changes like network failures caused a lot of instability.
- Lack of elasticity: Because of the limited number of Macs we had, our queues grew larger and larger as we increased the number of features on our app and the engineers on our team. We had a hard time spinning up new machines.
- Unable to remotely access the build servers: Setting up remote access to the build servers was a challenge for us. The workarounds we made to enable remote SSH were fragile and could have become security liabilities.
Choosing a cloud solution
Given all the challenges described above, we decided to get our iOS CI/CD builds into a cloud solution. In addition to that, we also set the following criteria:
- Leverage the cloud to achieve elasticity. Reduce the cost of creating new instances by removing the maintenance burden of having our own infrastructure.
- Make builds stable.
- Improve the user interface so workflows are easy to pick up and understand by other engineers.
- Create the ability to test any new Xcode beta version shortly after release.
- Create the ability to cache build data per branch to speed up builds.
- Enable remote access into the build servers.
- Simplify code signing management.
We considered a few services like CircleCI, Bitrise, TravisCI, and GitHub Actions and selected Bitrise because they were able to fulfill our success criteria.
Trade-offs
With any technical decision, there are no perfect solutions so it is important to establish what compromises you are willing to make.
In our case, migrating from on-prem Macs-Jenkins infrastructure to a cloud base solution was one of those compromises. For us, having slightly longer build times is a tradeoff we are happy to make in order to get more stable builds.
Here is a benchmark comparing the build times of two of our CI jobs before and after the migration:
From the table above it is clear that the cloud build times take longer but that is offset by the faster queue times. Not a bad tradeoff in order to increase build stability and reduce maintenance costs!
Tips & tricks
1. Create Shareable Workflows
Be sure to create utility workflows or normal workflows that can be easily added to your primary workflows (basically, plug-and-play). You can configure these workflows through environment variables. For example, here we have 3 utility workflows _swiftlint, _build_ipa, and _deploy_to_firebase. These 3 utility workflows now can be plugged into primary and rc workflows. This allows both primary and rc workflows to produce different IPAs and deploy to different Firebase App Distribution channels without having to edit them at all.
Let’s look into how _deploy_to_firebase looks like:
#!/usr/bin/env bash -ebundle exec fastlane deploy_to_firebase \
scheme:"${SCHEME}" \
firebase_app_id:"${FIREBASE_APP_ID}" \
groups:"${FIREBASE_TEST_GROUP}"
This syntax allows us to use different environment variables for each workflow, as you can see on the screenshots below:
2. Use Remote Access With Screen Sharing
Remote access with Screen Sharing is very useful when troubleshooting a build issue. This is especially useful when we want to troubleshoot issues like validating certificates. Having access to the Keychain with UI is much easier than accessing it through CLI because we don’t need to know all the commands, just the familiar user interface with Mac is enough.
Conclusions
We’ve been running our new cloud iOS CI/CD for several months now, and we’re very happy with the results. Instead of spending hours troubleshooting CI/CD, our engineers can now focus on making our products better for our customers.
Let us know if you have any questions or comments about our cloud CI/CD with Bitrise. And, if you’re interested in solving challenging problems like this, come join us! Visit our career page to learn more!
Special thanks to the Client Platform team, Developer Experience team, and iOS engineers who have helped in this fun journey.