Structuring Terraform Code

Introduction

Terraform is a powerful Infrastructure-as-Code (IaC) tool that allows you to define and manage all kinds of infrastructure in a declarative way. How you choose to structure your Terraform code can have a significant impact on the long term maintainability and scalability of your infrastructure.

Anatomy of a Terraform project

Terraform code is organized into modules. A module is a container for multiple resources that are used together. A module is formed any time you have .tf files in a directory. Think of a module as a class in object-oriented programming. It can have its own variables, outputs, and resources. Modules can be nested, meaning that a module can call another module.

Terraform projects always have at least one "root" module, which is where you invoke terraform init and terraform apply. The root module is the entry point for your Terraform configuration and Terraform will only create resources that are defined in the root module or any modules that are called from the root module.

Any Terraform module can contain the following components:

Resources: The infrastructure components Terraform will create and manage.
Data sources: Used to fetch data from external sources (eg: fetching the latest AMI ID from AWS).
Variables: Used to parameterize your Terraform code and make it more reusable.
Outputs: Used to expose values from your Terraform code that can be used by other modules or by the user after running terraform apply.
Module calls: Used to call other modules from within a module, allowing you to reuse code and create more complex infrastructure.
Providers: Used to configure the providers that Terraform will use to manage your infrastructure (eg: AWS, Azure, Google Cloud).
Locals: Used to define local values that can be used within a module. Locals are similar to variables, but they are not meant to be set by the user. They are typically used for intermediate calculations or to simplify complex expressions.

Additionally, the root module can also contain:

Backend configuration: Used to configure where Terraform will store its state file, which is critical for collaboration and state management.
.tfvars files: Used to provide variable values to Terraform. These files can be used to set different variable values for different environments (eg: dev.tfvars, staging.tfvars, prod.tfvars).

Conventions

Within a single module (directory), all .tf files are created equal. There are no hard and fast rules about what you need to name these files, or what goes in them. However, following a consistent convention can help improve readability. It's useful to be able to glance at a directory and understand what kind of code is contained within it. With that in mind, here are some common conventions people use:

init.tf - contains initialization code, such as backend configuration and provider setup.
main.tf - contains the main resources, data, and locals for the module.
[other names].tf - Any other resources, grouped by type or function (eg: network.tf, compute.tf, databases.tf).
variables.tf - contains variable definitions.
outputs.tf - contains output definitions.

What's hard to change later?

Moving resources from one module to another (eg: moving aws_instance.web to module2.aws_instance.web will cause it to be fully recreated unless you use a moved block).
Resource names (eg: aws_instance.web to aws_instance.app).
Restructuring your root module.
Specific properties of specific resources (eg: changing the AMI of an aws_instance will cause it to be fully recreated).

Option 1: One "configurable" root module

init.tf
main.tf
variables.tf
outputs.tf
modules/
  module1/
    main.tf
    variables.tf
    outputs.tf
  module2/
    main.tf
    variables.tf
    outputs.tf

In this configuration, you use a single root module for all environments. You can use variables to configure the behavior of the root module and its child modules. This approach is simple and easy to understand, but it can become difficult to manage as your infrastructure grows.

Pros:

Simpler to understand and manage for small projects.
Each environment is guaranteed to get the same code.

Cons:

Per-environment differences that can't be changed through simple variables are impossible.
Backend configuration is what dictates where your state is stored, and backend configuration can't use variables. As a result, you must pass that configuration on the command line, which can lead to mistakes and confusion.

Option 2: Separate root modules for each environment

env/
  dev/
    init.tf
    main.tf
    variables.tf
    outputs.tf
  staging/
  prod/
modules/
  cdn/
    main.tf
    variables.tf
    outputs.tf
  oidc-client/
    main.tf
    variables.tf
    outputs.tf

In this configuration, you have separate root modules for each environment. Each environment can have its own configuration, including invoking totally different modules. This approach allows for more flexibility and can be easier to manage as your infrastructure grows, but it can also lead to code duplication and inconsistencies between environments if you're not careful.

Pros:

Each environment can have its own configuration, including invoking totally different modules.
Backend configuration can be hard-coded in the root module, which reduces the risk of mistakes and makes it easier to use.

Cons:

More complicated to understand for newcomers.
Code duplication between environments can lead to inconsistencies and maintenance overhead (eg: multiple sets of providers)

Option 3: Multiple separate root modules for each environment, with shared child modules

env/
  dev/
    common/
        init.tf
        main.tf
        variables.tf
        outputs.tf
    application/
        init.tf
        main.tf
        variables.tf
        outputs.tf
  staging/
  prod/
modules/
  cdn/
    main.tf
    variables.tf
    outputs.tf
  oidc-client/
    main.tf
    variables.tf
    outputs.tf

In this configuration, you have separate root modules for each environment. Doing this allows you to split up a deployment into phases, applying first the common root module for the environment, then the application root module. This can be useful for managing dependencies between resources and ensuring that certain resources are created before others.

A common example of where you might use this is in an ECS-based application, where the ECR repository needs to be created before you can push the Docker image or deploy the ECS service.

In this case, the common module would create the ECR repository, and the application module would deploy the ECS service. This allows you to split the deployment in two, with a push of the Docker image in between. This can be visualized in the following diagram:

This approach works best when you limit the amount of resources in the common module. Because the deployment can potentially fail at building the Docker image, you don't want to have a lot of resources in the common module that would be left dangling if the deployment fails at that point. In general, common should be reserved for stable, long-lived resources.

By splitting up root modules, you can manage the dependencies between these resources more effectively and ensure that the ECR repository is created before you attempt to push the Docker image or deploy the ECS service.

Managing dependencies between root modules

By design, in this configuration you will have resources in the common root module that are needed by the application root module. Because these are separate root modules, you can't directly reference common resources in application. You have two options for managing these dependencies:

Use outputs from the common module and pass them as variables to the application module. For example, using Github Actions, you could have a workflow that first applies the common module, then captures the outputs and passes them as variables to the application module in the next step.
Use terraform_remote_state data sources in the application module to fetch outputs from the common module. This requires that the common module is applied first and that its state is stored in a remote backend that the application module can access.

Approach 1 works well for simple dependencies, but it can become unwieldy if you have a lot of outputs to pass between modules. Approach 2 is more scalable and allows you to manage dependencies more effectively, but it also creates an implicit dependency between the common output structure and the application variables.

Pros:

Each environment can have its own configuration, including invoking totally different modules.
Backend configuration can be hard-coded in the root module, which reduces the risk of mistakes and makes it easier to use.
Allows you to split up a deployment into phases, which can be useful for managing dependencies between resources.

Cons:

Most complicated to understand for newcomers.
More code duplication (eg: multiple sets of providers).

Choosing an approach

Any of the above approaches can work, but we recommend the following decision tree to help you choose which one is best for your use case:

Mighty Practices

Structuring Terraform Code

Structuring Terraform Code

Introduction

Anatomy of a Terraform project

Conventions

What's hard to change later?

Option 1: One "configurable" root module

Pros:

Cons:

Option 2: Separate root modules for each environment

Pros:

Cons:

Option 3: Multiple separate root modules for each environment, with shared child modules

Managing dependencies between root modules

Pros:

Cons:

Choosing an approach

Help us improve

Related Plays

Build reproducible environments

On this page