Terraform for data engineers

Blogs

Hello, tech enthusiasts! Greetings to all the data engineers who are tirelessly exploring the magnificent realm of IT. In this space, we constantly strive to integrate different IT domains and construct reliable solutions while overcoming the challenges that any IT project may encounter. In my previous articles, I discussed numerous tools and aspects essential to our work, and there is more valuable information yet to come. So, let’s continue learning and remain committed to the journey of improvement.

As a data engineer, it’s crucial to note that provisioning resources and managing the system’s infrastructure are integral aspects of our role. Therefore, it’s essential for us to familiarize ourselves with these tasks. That’s why today, I want to discuss Terraform. Don’t be intimidated by any tool; as data engineers, we adapt quickly and can tackle any challenge.

I. Introduction :

I’m certain that everyone, even those starting out, understands that when running a script or performing any task, a system is essential. We require a computer or a server, something to host our solution. In our domain as data engineers and data scientists, providing infrastructure is crucial for tasks like data processing, building machine learning models, or conducting data visualization. The traditional methods of doing such tasks are truly cumbersome and time-consuming, especially for testing simple scenarios. That’s why we owe a debt of gratitude to HashiCorp for their outstanding work in providing us with the excellent tool known as Terraform. What exactly is Terraform, and how can it be applied in practical, real-life scenarios ?

II. In-Depth Exploration :

Here, we won’t claim to introduce anything magical and will simply adhere to the definition in the official Terraform documentation. However, we’ll aim to rephrase it in a straightforward manner. There are three types of environments we are aware of : cloud, on-premise, and hybrid environments. In the cloud environment, you can deploy servers and the entire infrastructure using a user interface without having to deal with the maintenance, physical installations, and similar challenges. The on-premise environment represents the traditional approach where you are accountable for all aspects of the infrastructure, requiring you to manage everything. Lastly, there’s the hybrid environment, a combination of both. You might begin in an on-prem environment and, as you scale up and seek to optimize costs, leverage the assistance of the cloud.

What kind of resources or infrastructure can Terraform provision ?

In the picture below, we can observe what Terraform is capable of provisioning. Additionally, it has the ability to provision custom resources based on the specific use case.

Virtual servers : You can use Terraform to create and manage virtual servers on various platforms.
Networking components : This includes items such as virtual networks, subnets, firewalls, and load balancers.
Databases and storage solutions : Terraform can manage storage components like object storage, disks, and data storage solutions. You can as well databases, such as SQL or NoSQL databases.
Container Orchestration : Terraform supports orchestrating containerized applications, deploying clusters using tools like Kubernetes.
Security Components : This includes provisioning security groups, encryption, and other security-related configurations.

how does it work in real life ?

In practical terms, Terraform operates through three main steps:

1.Write:
In this stage, we define the resources we want using Terraform configuration. For instance, we might specify the creation of an EC2 instance within an AWS VPC (Virtual Private Cloud). The configuration details are outlined in the Terraform files.

2.Plan:
Terraform generates an execution plan in this step. It assesses the configuration and provides information on what actions it will take. This includes whether it will create new resources, remove existing ones, or modify the infrastructure based on the provided configuration. The plan acts as a preview of changes.

3.Apply:
The Apply step is where Terraform executes the plan and brings the defined infrastructure into existence. It creates or modifies resources according to the plan and updates the state file to reflect the current state of the infrastructure. The state file is crucial for Terraform to understand the existing state and track changes over time.

III. Tutorial :

Now, let’s do things ourselves. In our guide, we’ll make, change, and remove infrastructure. We’ll start by making an EC2 thing in AWS, then we’ll update it, and finally, we’ll delete it.

Requirements :

aws free tier account
install terraform

step 1 : write the configuration of the desired state :

As you can see in the picture below , we’ll use the HashiCorp Configuration Language to write configuration files. Specifically, we’ll create a `main.tf` file in the “terraform-ec2” folder. This file will contain the configuration to define a single AWS EC2 instance.

The file consists of three distinct parts: Terraform configuration, Provider configuration, and Resource configuration. Each section plays a specific role in defining and orchestrating the desired infrastructure.

terraform section : In the Terraform section, we can specify certain behaviors of Terraform itself, such as setting the minimum required Terraform version. This ensures that the configuration is compatible with the specified Terraform version and avoids potential issues.

provider configuration : These are the TerraForm that uses providers to connect to remote systems. in our case we will use AWS as a provider .

resource configuration : here we define our infrastructure as resources , in our case we are defining ec2 instance as a ressouce .

step 2 : Initialise the working directory

After writing the setup, run “terraform init” in the same folder as your setup file (main.tf). This command gets everything ready for your Terraform project, downloading needed tools and preparing a place to store its important data.

terraform init

After running the terraform init command, several files and folders are generated in the working directory :

.terraform Folder: This folder stores plugins and other files needed by Terraform. It’s usually hidden on Unix-like systems.
terraform.tfstate: This file keeps track of the current state of your infrastructure. It’s crucial for Terraform to understand the existing state when planning and applying changes.
terraform.tfstate.backup: This file is a backup of the previous state before changes were applied.

step 3 : validate the resource declaration :

In this step, we’re validating and organizing the configuration. During the planning phase, we execute the `terraform plan` command. This command outlines the resources that will be created. If we take a look, we can see that an EC2 instance is among the resources slated for creation. This planning phase gives us a preview of the changes that Terraform is about to make.

terraform plan

step 4 : create the infrastructure and apply the plan :

terraform apply

as we can see the ec2 instance is created .

Now, we’re moving into the phase of modifying the infrastructure and applying the changes. Recognizing that infrastructure is continually evolving, Terraform helps manage these changes by adjusting the Terraform configuration files. A new execution plan will be formulated to bring your infrastructure to the desired state.

Specifically, we are changing the Amazon Machine Image (AMI) of our instance. Initially, we used the default Amazon Linux AMI (ami-065ab11fbd3d0323d), and now we’re switching to the Ubuntu AMI (ami-04e601abe3e1a910f). This change involves modifying the `ami` field in the resource section of the configuration. This update reflects the dynamic nature of infrastructure management with Terraform.

then we will apply the changes with the command : terraform apply .

Absolutely, the `-/+` sign in the Terraform output indicates that the instance will be both destroyed and recreated. This happens when a significant change, like switching the AMI, requires the creation of a new instance instead of modifying the existing one. Terraform intelligently handles these changes, ensuring the desired state is achieved while minimizing disruptions to the infrastructure.

Everything went smoothly, and we now have our new EC2 instance with the Ubuntu image. Let’s confirm this in the AWS UI to visually ensure that our desired changes are reflected there.

Before proceeding, let’s discuss Terraform variables. Up to now, we’ve hardcoded values in the resource section, but when dealing with multiple resources like RDS, VPC, ECS, and more, Terraform variables become essential. To implement this, we’ll create a file named ‘variables.tf’ in the same directory as ‘main.tf’.

what we will do is we will move the tag value to the variables.tf file as follows :

Certainly, let’s proceed with the changes in the ‘main.tf’ file accordingly .

Now, it’s time to apply the changes :

terraform apply

and this is what we get :

step 6 : the last step is to destroy the infrastructure

terraform destroy

the result :

IV. Conclusion :

As I said , being a data engineer implies having a lot of skills , it is not only about building pipelines , you have to take care about the whole system , dealing with the networking resources , and storage layer and different aspects of resources .

Blogs