Hello again! In this post I’ll write a few words about how we started using Terraform Cloud and enabled access to our engineers.
Our dream is to allow teams to manage their own resources in a self-served manner, and for that they will need a platform that supports infrastructure management in an automated way. To understand why we want this, I’ve touched on that topic a bit in the previous post. Now, let’s jump in to see the next step in building out the internal platform! 🚀
Terraform Cloud
We love Terraform. It’s a tool created by Hashicorp that uses declarative syntax (HCL) to describe infrastructure as code (IaC) that can be used to read, create, update, delete, and manage resources in a provider-agnostic way. We are using it on a daily basis to create complex architectures with internal and external dependencies, apply changes in a smart way with the least possible impact, and perform specific tweaks while making sure the end result is an intact and functioning system in all cases.
While Terraform in itself is a great tool, we are aiming to build out a platform that allows all engineers to collaborate, so we need to make sure we have a scalable solution. Behind the scenes, Terraform operates with states and determines what exact changes to apply based on the difference between the state and the actual infrastructure. For this reason, we must make sure that the state is kept intact, we avoid concurrent operations, and that we can manage access properly. As with the project structure described in the previous post, we want teams to be fully enabled to work independently, but limit their scope only to the systems they own and reduce any potential blast radius of a misconfiguration. 🔒
Enter Terraform Cloud, the hosted version of Terraform. It provides a way to collaboratively plan, and automatically apply changes to the infrastructure (i.e. it’s a CI/CD for Terraform). It also has extensive enterprise features like teams and access levels. This is exactly what we need to enable engineers to manage their own resources - they just need to have access to a workspace that creates the resources they define in HCL. Workspaces are essentially projects in Terraform Cloud that are managed independently, have separate states, and they can be executed concurrently and automatically in the cloud. To allow scaling and make their management easier we define access on the teams level in all cases, and so the workspaces are linked to owner teams.
Define teams
There are actually a couple of things that are associated with teams in our internal platform. As discussed in the previous post about the project structure, teams have a folder in GCP encompassing all the projects of their owned systems. Thanks to the advantages of folders, a single GCP IAM rule is enough for each team (and moving projects among teams is quite easy). These projects also should have their own workspaces on Terraform Cloud that are responsible for managing resources in them in an automated way (more on these in a later post). Teams then will need to have access to these workspaces. But of course, first we’ll have to create the teams on Terraform Cloud and populate them with actual team members.
In Bitrise we’re using Google Workspaces as our identity provider. That means that each person has a google identity (email) with the company domain, and teams are defined as google groups. To set up teams' access to their GCP folders we only have to create IAM rules for the groups themselves, but Terraform Cloud does not natively support this, so we have to define teams separately and invite members individually.
We’re striving to codify everything that is possible in Terraform
We’re striving to codify everything that is possible in Terraform, and this was also the case with teams and their access to the platform. We have created a separate workspace to manage the internal platform’s project structure and started to define teams there, using specific modules for team composition, access levels, GCP folders, etc. In the end, we have a single central codebase that sets up teams and workspaces based on a central descriptor. For Terraform Cloud resources (such as teams, access, invitation, etc.), there is a provider that allows managing these settings in Terraform. I want to highlight how awesome this is, to be able to manage Terraform Cloud from a workspace running on Terraform Cloud 🤯
Creating a team is as easy as creating a tfe_team resource, and we can also assign workspaces to teams with the tfe_team_access resource. Then, to add members, a tfe_team_organization_member resource based on the user’s organization membership has to be created. But it’s the invitation to the organization (i.e. creating the membership) on Terraform Cloud where things start to get interesting.
Inviting team members
To invite team members to the organization we had to create tfe_organization_membership resources for each person in each Google group. To make it more interesting, we had to create exactly one such resource for each person, regardless of how many teams they belong to (so they had to be invited exactly once). So, we created a separate module just for these resources, collecting all members of all teams and creating one resource for all members using the for_each clause on the set of people (a set is a built-in data structure ensuring uniqueness).
To keep the management burden to a minimum, we wanted to only list the teams' google groups in the team descriptor file, and query actual members through Google’s APIs. Unfortunately, calling the APIs of Cloud Identity requires special permissions that we had to grant to the service account we used to authenticate with when using the google provider in our Terraform code. To achieve this, we asked a super admin to help us in the following flow.
First, we had to get our workspace ID (not to mix up with Terraform workspaces!) from the Google Admin Console. Using that ID we could list the available roles by calling the appropriate endpoint in the directory API. We have selected the Groups Reader role (the bare minimum we need for this task) and used its ID to grant the service account the necessary role by calling the role assignments endpoint. After this setup, our service account running the project setup workspace was able to query google group members and invite people to Terraform Cloud 🎉
Note, that since we don’t have a super admin service account (nor we do want to have one), and these endpoints do not have corresponding Terraform resources, we had to set this up manually. That is an acceptable bootstrapping setup that we had to do only once.
Next step: workspaces
Now that teams can access Terraform Cloud and their owned systems' workspaces, they can start creating their resources in a self-served and automated way. But to actually run their code we have to set up those workspaces to be able to manage infrastructure on Kubernetes and GCP (and using various other providers). We’ll discuss these in a later post. Until then, hang tight! Or maybe check out our current job openings 😉