In this series, we have been using Terraform by ourselves. We defined the infrastructure - created a plan, created the infrastructure itself and also we saved our code to Github. We had the choice to store and maintain states locally.
However, using Terraform in a team where more than one developer are contributing to the Terraform code can get a bit tricky. Certainly there are different approaches to using Terraform in a team mode for collaboration - especially the teams which belong to organisations. In this post, we would explore one of the approaches and see how different it is from being a solo developer.
Introducing Terraform into the team can be a bit difficult. It is not just a matter of introducing a new language of infrastructure development, but also involves adopting new processes and authorization models to manage remote executions and states. Like DevOps, this transition also demands the change in mindset of the team.
One of the most important challenges to be addressed is to undergo internal migration where the traditional approach to manage infrastructure now needs to be pulled under Terraform management. This can be done in several ways, for example, prioritizing the new projects to start using Terraform directly, then grouping and planning the import activities for existing project infrastructure keeping in mind different environments etc.
Typically organisations have a way of doing things and they value their infrastructure a lot since it is the backbone of their business. There are strict security protocols in place, compliance and governance frameworks play a huge role, there are ways to handle legal and legislative obligations. These factors affect how a company's infrastructure is built. Comparing it with a startup which has just started deploying their services in cloud, you could see the difference.
Organisations have years of learning experience while they try to be up to date with latest developments in all the above areas. A project being run in these organisations needs infrastructure which comply with the organisation’s policies and practices. While using public cloud, they remain extra cautious while designing their infrastructure.
This results in developing their templates for everything - storage, compute, network and even databases. For example, organisations use their hardened images to run their several types of virtual servers, or defining rules around networking related to access and security policies etc.
While introducing Terraform in an organisational setup - it is suggested to have the templates for different infrastructure components which are pre-configured with standards and policies. In terms of Terraform, these can be modules published on internal repositories to be used by project teams.
As we know, modules are reusable IaC. If a central team is made responsible to maintain and publish new modules to be used by projects, it saves a lot of time to re-configure the same set of infrastructure along with the hardening processes. A central (Git) repository for Terraform modules should exist to be reused at project level.
This does not mean the corresponding realtime infrastructure for those modules have to exist. These can simply be blueprints which projects can start building upon the same. Having this repository in place enables organisations to have a central and unified control over their infrastructures at project level.
In case if a project needs to use a functionality which does not exist in the central repository, they may build their modules which comply with the standards. Internally, the central management team can help them certify their modules and publish the same on the repository for future use.
Doing all this would make sense only if quite a few projects/majority of them are working on cloud infrastructure. This setup is not suitable and just proves to be an overhead if there is one off project which deals with cloud. But given the time when many organisations pursue digital transformation, this cannot be ignored.
Projects generally run in a time bound manner. In such a situation, help from accelerators is much appreciated not only to meet client expectations but to excel at the delivery in all ways possible.
Projects should be expected to use the central repository of blueprints while building their environments. The published Terraform modules can be cloned and re-used in every project as per their needs. This saves a lot of time since the compliance and security practises are followed by default. They may choose to build/customize more on top of the existing modules.
Whenever a project is run or a product is being built, they always do it using staging environments before putting it on production. These are essentially separate environments which means they have different sets of infrastructures. Staging environment may be a scaled down version of production, but the idea is to imitate the production behaviour for testing purposes.
Usually the application code is maintained on version control systems like Git. VCS allows the developers to create branches, which essentially means creating a copy of the master/main branch and trying out new code changes on the same. Before merging the branch into the main branch, testing takes place and if successful, a pull request is made to the owner of the repository.
It works in the same way with Terraform code. However, concerning Terraform, there is one additional component - remote backend. This is important and needs to be taken into consideration because Terraform execution highly depends on state information.
We use remote backends to mainly store Terraform state and execute Terraform apply and destroy commands. This enables the sharing of state with multiple developers in the team and there is no single person responsible for the same. Maintaining state remotely serves 2 purposes -
Sharing of state
Remember, as discussed before we are including the state file in .gitignore file of Git/VCS - which means state information is not shared with other developers via VCS. Imagine a situation where 2 developers cloned different commits of Terraform IaC, implemented their changes and ran Terraform apply using their version of state files. This is a recipe for disaster as both Terraform executions would cause unexpected results and race conditions.
To maintain different environments using Terraform, we can make use of workspaces. Workspaces work very similar to Git branches. They create a different copy of the environment which is not related to the main copy. Workspaces can be used by developers to test their infrastructure as code changes separately before they merge the changes back.
However, merging the changes back in production does not happen via workspaces. Once the new configurations are tested, the (developer specific) workspace can be destroyed. To align with project environments - as a good practice every environment should be managed via workspace - more on this in a moment.
As far as VCS is concerned, there is only one repository where the IaC for the project is stored. This means, every project always has one master branch representing it’s production environment. Depending on the number of sub-production environments required, several branches can be created corresponding to the number and names of the sub-production environments on VCS.
We will take an example of Terraform Cloud for workspaces. Terraform cloud supports remote state storage and execution of Terraform commands. It provides features like workspaces, state file locking, VCS integration etc. to implement a collaboration practice. It is also possible to apply access controls in terms of which workspace can automatically trigger the deployment of infrastructure changes and where manual controls are needed.
Workspaces on Terraform Cloud can also be related to the VCS branches. There could be one workspace per sub-prod branch. Terraform Cloud allows VCS integration where you can associate a particular branch of the repository with a particular workspace. These relationships help in the deciding which infrastructure changes when committed on the branch should be implemented automatically and which ones should be controlled.
Terraform Cloud also offers state locking capability. In case when more than 2 developers commit their changes to the sub-prod VCS branch, the earliest commit will be deployed first and rest of the commits will be queued after this commit is successfully done.
Working with Terraform Cloud is easier as it is easy to configure this backend as part of the terraform settings code as opposed to using AWS S3 as backend, where you need to run an additional command to set up a S3 bucket as a remote backend everytime a repository is cloned on a new system.
Using Terraform to deliver infrastructure requires certain processes to be in place. As mentioned earlier certain teams/roles which may exist are - central management team for managing blueprints, project teams, developers who are responsible to design (develop) IaC, project specific roles controlling the execution access on production workspace etc.