Practical Terraform: You're Doing it Wrong (Part 2)

Oct 15, 2024

This is Part 2 of the set of practical Terraform tips. If you missed Part 1, check it out here:

Practical Terraform: You're Doing it Wrong

We've all written Terraform IaC that we're not proud of before--it happens. I'm here today to talk about the Terraform you write that you think you're proud of...until it outgrows your team, becomes hard to manage, and terrifies you anytime you terraform apply…

8 months ago · Zach King

We'll explore several more Terraform pitfalls and how to avoid them so your infra teams can succeed long-term!

1. Too many conditional resources

Have you ever seen a module that creates too many conditional resources using syntax like for_each and count and ternary operators? These are less readable and generate a lot of complexity for maintainers.

For example:

# file: modules/storage/main.tf

variable "create_s3_bucket" {
  type        = boolean
  description = "Whether to create an S3 bucket as well"
  default     = false
}

resource "aw_s3_bucket" "this" {
  count = var.create_s3_bucket ? 1 : 0
  ...
}

# Other storage resources... EFS volumes, Backup configuration, etc.
...

Used in moderation, this can be a fine way to add flexibility to your module. However, take it too far, and you'll have a Frankenstein of faux-array references like aws_s3_bucket.this[0] or worse: var.create_s3_bucket ? aws_s3_bucket.this[0] : "". It also makes your configuration more difficult to read and reason with.

When it becomes too much, decoupling these conditional components into a separate module is better. Instead of using ternary operators, you can instantiate a module or don't.

2. Resources that don't belong in a Module

We will only travel a short distance from the previous example. Let's say we have the above module with the conditionally-created S3 bucket. We may also have a few folders for deploying to Dev, Test, and Prod environments:

# file: deployments/dev/main.tf

module "storage" {
  source           = "../../modules/storage"
  create_s3_bucket = true
  env              = "dev"
}

# file: deployments/test/main.tf

module "storage" {
  source           = "../../modules/storage"
  create_s3_bucket = false
}

# file: deployments/prod/main.tf

module "storage" {
  source           = "../../modules/storage"
  create_s3_bucket = false
}

In this scenario, we create the S3 bucket only in one environment. This is typical of infrastructure resources unique to one environment's purposes, such as Developer sandboxes, QA tooling, and ad-hoc troubleshooting devices.

Using conditional resources with variables is the wrong way to solve this use case; if a resource is specific to one environment, you should only create the resource in that environment's deployment. Either of the following refactors would work in this example:

Create a new module named s3_storage and move the S3 bucket and related resources inside. Remove the variable and count syntax, then instantiate the module only during the dev deployment.
Don't use a module for the S3 bucket and related resources. Simply put them in the deployments/dev/ folder directly.

3. Sandbox Environments

This next tip is a godsend for parallelization and will also increase your confidence in your ability to spin up and tear down the infrastructure.

We often write Terraform with our environments in mind, like Dev, Test, Stage, Prod, etc. We create resources with the environment keyword in the resource name, such as naming a Lambda function sqs-ingestor-${var.env}-${var.region}, which creates sqs-ingestor-dev-us-east-1. This may be fine for your team, especially on a smaller scale and when you work by yourselves; however, what do you do when your colleague needs to test their version of the Lambda function while you're testing your feature branch?

This is where Terraform Workspaces come in very handy. Say the two developers are myself and Sara Hollis, working on different new features simultaneously. If we plan the Terraform with parallelized work-streams in mind, we can name the Lambda function including a Terraform built-in variable ${terraform.workspace}, which contains the name of the current workspace.

resource "aws_lambda_function" "sqs_ingestor" {
  name = "sqs-ingestor-${terraform.workspace}"

  tags = {
    Name = "sqs-ingestor-${terraform.workspace}"
  }
}

Then I proceed with my work:

zcking> git checkout feature/new-redundancy-options
zcking> terraform workspace new dev-zcking
zcking> terraform workspace select dev-zcking
zcking> terraform init && terraform apply

While Sara does the same for her work:

shollis> git checkout feature/json-schema-evolution
shollis> terraform workspace new dev-shollis
shollis> terraform workspace select dev-shollis
shollis> terraform init && terraform apply

Both will apply the resources successfully and in parallel--two Lambda functions will exist afterward. Terraform calls these workspaces, but I also like to refer to them as sandboxes because we each have our own isolated environment to play in, like a sandbox.

Note: This is usually feasible, but every use case is different. Sometimes, you may prefer to keep certain resources global or shared, such as ECS/EKS clusters, databases, and others—usually for cost and data reasons.

4. Remote States

My final tip for creating a more practical Terraform configuration is a data query to remote state files. This is where you programmatically query resource information stored in another remote Terraform state file, such as from another environment or team. By querying a remote state file, you can reuse outputs and modularize your project further without requiring the resources to be managed in the same deployable scope of Terraform code.

For example, we may deploy core networking infrastructure like our VPC with one state file:

terraform {
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "vpc/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_vpc" "main" { ... }

output "vpc_id" { 
  value = aws_vpc.main.id
}

Now in a separate Terraform project, we can query the remote state and access the output to deploy an EC2 instance into the VPC:

data "terraform_remote_state" "vpc" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key    = "vpc/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "web" {
  ami           = "ami-12345678"
  instance_type = "t2.micro"
  vpc_security_group_ids = [aws_security_group.web_sg.id]
  subnet_id     = data.terraform_remote_state.vpc.outputs.vpc_id
}

You may wonder, "Couldn't I just use a data query to the aws_vpc resource rather than interrogate the Terraform state?" Technically, yes, and in your case, you may prefer that simplicity.

The tradeoff is that when using a traditional data query, you will need to include filters based on resource ID, tags, or other attributes; furthermore, some resources do not offer a data query to look up the infrastructure. Querying a remote state defers to the source of truth—the Terraform that deployed the dependency resources, like the VPC in this case.

Conclusion

I hope you enjoyed this expansion of my tips for practical Terraform-ing!

Follow for more content like this!

MakeWithData

Discussion about this post