Blog - Einleitung

Ancud Blog

Herzlich Willkommen zum Ancud Blog. Hier finden Sie eine Vielzahl von interessanten Artikeln zu verschiedenen Themen. Tauchen Sie ein in unsere Welt des Wissens! 

Blogs

Blogs

Terraform for data engineers

Hello, tech enthusiasts! Greetings to all the data engineers who are tirelessly exploring the magnificent realm of IT. In this space, we constantly strive to integrate different IT domains and construct reliable solutions while overcoming the challenges that any IT project may encounter. In my previous articles, I discussed numerous tools and aspects essential to our work, and there is more valuable information yet to come. So, let’s continue learning and remain committed to the journey of improvement.

As a data engineer, it’s crucial to note that provisioning resources and managing the system’s infrastructure are integral aspects of our role. Therefore, it’s essential for us to familiarize ourselves with these tasks. That’s why today, I want to discuss Terraform. Don’t be intimidated by any tool; as data engineers, we adapt quickly and can tackle any challenge.


 
Fig 1 : terraform logo

I. Introduction :

I’m certain that everyone, even those starting out, understands that when running a script or performing any task, a system is essential. We require a computer or a server, something to host our solution. In our domain as data engineers and data scientists, providing infrastructure is crucial for tasks like data processing, building machine learning models, or conducting data visualization. The traditional methods of doing such tasks are truly cumbersome and time-consuming, especially for testing simple scenarios. That’s why we owe a debt of gratitude to HashiCorp for their outstanding work in providing us with the excellent tool known as Terraform. What exactly is Terraform, and how can it be applied in practical, real-life scenarios ?

II. In-Depth Exploration :

Here, we won’t claim to introduce anything magical and will simply adhere to the definition in the official Terraform documentation. However, we’ll aim to rephrase it in a straightforward manner. There are three types of environments we are aware of : cloud, on-premise, and hybrid environments. In the cloud environment, you can deploy servers and the entire infrastructure using a user interface without having to deal with the maintenance, physical installations, and similar challenges. The on-premise environment represents the traditional approach where you are accountable for all aspects of the infrastructure, requiring you to manage everything. Lastly, there’s the hybrid environment, a combination of both. You might begin in an on-prem environment and, as you scale up and seek to optimize costs, leverage the assistance of the cloud.

What kind of resources or infrastructure can Terraform provision ?

In the picture below, we can observe what Terraform is capable of provisioning. Additionally, it has the ability to provision custom resources based on the specific use case.


 
Fig 2: Terraform resources
  1. Virtual servers : You can use Terraform to create and manage virtual servers on various platforms.
  2. Networking components : This includes items such as virtual networks, subnets, firewalls, and load balancers.
  3. Databases and storage solutions : Terraform can manage storage components like object storage, disks, and data storage solutions. You can as well databases, such as SQL or NoSQL databases.
  4. Container Orchestration : Terraform supports orchestrating containerized applications, deploying clusters using tools like Kubernetes.
  5. Security Components : This includes provisioning security groups, encryption, and other security-related configurations.

how does it work in real life ?

In practical terms, Terraform operates through three main steps:

1.Write:
In this stage, we define the resources we want using Terraform configuration. For instance, we might specify the creation of an EC2 instance within an AWS VPC (Virtual Private Cloud). The configuration details are outlined in the Terraform files.

2.Plan:
Terraform generates an execution plan in this step. It assesses the configuration and provides information on what actions it will take. This includes whether it will create new resources, remove existing ones, or modify the infrastructure based on the provided configuration. The plan acts as a preview of changes.

3.Apply:
The Apply step is where Terraform executes the plan and brings the defined infrastructure into existence. It creates or modifies resources according to the plan and updates the state file to reflect the current state of the infrastructure. The state file is crucial for Terraform to understand the existing state and track changes over time.

III. Tutorial :

Now, let’s do things ourselves. In our guide, we’ll make, change, and remove infrastructure. We’ll start by making an EC2 thing in AWS, then we’ll update it, and finally, we’ll delete it.


 
Fig 3: Tutorial architecture

Requirements :

  • aws free tier account
  • install terraform

step 1 : write the configuration of the desired state :

As you can see in the picture below , we’ll use the HashiCorp Configuration Language to write configuration files. Specifically, we’ll create a `main.tf` file in the “terraform-ec2” folder. This file will contain the configuration to define a single AWS EC2 instance.


 
Fig 4: screenshot of main.tf file

The file consists of three distinct parts: Terraform configuration, Provider configuration, and Resource configuration. Each section plays a specific role in defining and orchestrating the desired infrastructure.

  • terraform section : In the Terraform section, we can specify certain behaviors of Terraform itself, such as setting the minimum required Terraform version. This ensures that the configuration is compatible with the specified Terraform version and avoids potential issues.

 
Fig 5: terraform section
  • provider configuration : These are the TerraForm that uses providers to connect to remote systems. in our case we will use AWS as a provider .

 
Fig 6: provider section

resource configuration : here we define our infrastructure as resources , in our case we are defining ec2 instance as a ressouce .


 
Fig 7 : resource section

step 2 : Initialise the working directory

After writing the setup, run “terraform init” in the same folder as your setup file (main.tf). This command gets everything ready for your Terraform project, downloading needed tools and preparing a place to store its important data.

terraform init

 
Fig 8 : terraform init command

After running the terraform init command, several files and folders are generated in the working directory :

  1. .terraform Folder: This folder stores plugins and other files needed by Terraform. It’s usually hidden on Unix-like systems.
  2. terraform.tfstate: This file keeps track of the current state of your infrastructure. It’s crucial for Terraform to understand the existing state when planning and applying changes.
  3. terraform.tfstate.backup: This file is a backup of the previous state before changes were applied.

 
Fig 9 : terraform init output

step 3 : validate the resource declaration :

In this step, we’re validating and organizing the configuration. During the planning phase, we execute the `terraform plan` command. This command outlines the resources that will be created. If we take a look, we can see that an EC2 instance is among the resources slated for creation. This planning phase gives us a preview of the changes that Terraform is about to make.

terraform plan 


 
Fig 10 : terraform plan output

step 4 : create the infrastructure and apply the plan :

terraform apply 

 
Fig 11 : terraform apply

as we can see the ec2 instance is created .


 
Fig 12 : Aws EC2 UI

Now, we’re moving into the phase of modifying the infrastructure and applying the changes. Recognizing that infrastructure is continually evolving, Terraform helps manage these changes by adjusting the Terraform configuration files. A new execution plan will be formulated to bring your infrastructure to the desired state.

Specifically, we are changing the Amazon Machine Image (AMI) of our instance. Initially, we used the default Amazon Linux AMI (ami-065ab11fbd3d0323d), and now we’re switching to the Ubuntu AMI (ami-04e601abe3e1a910f). This change involves modifying the `ami` field in the resource section of the configuration. This update reflects the dynamic nature of infrastructure management with Terraform.


 
Fig 13 : main.tf changed

then we will apply the changes with the command : terraform apply .


 
Fig 14 : terraform apply output

Absolutely, the `-/+` sign in the Terraform output indicates that the instance will be both destroyed and recreated. This happens when a significant change, like switching the AMI, requires the creation of a new instance instead of modifying the existing one. Terraform intelligently handles these changes, ensuring the desired state is achieved while minimizing disruptions to the infrastructure.


 
Fig 15 : terraform apply output

Everything went smoothly, and we now have our new EC2 instance with the Ubuntu image. Let’s confirm this in the AWS UI to visually ensure that our desired changes are reflected there.


 
Fig 16 : AWS UI

Before proceeding, let’s discuss Terraform variables. Up to now, we’ve hardcoded values in the resource section, but when dealing with multiple resources like RDS, VPC, ECS, and more, Terraform variables become essential. To implement this, we’ll create a file named ‘variables.tf’ in the same directory as ‘main.tf’.


 
Fig 17 : new project architecture

what we will do is we will move the tag value to the variables.tf file as follows :


 
Fig 18 : variables.tf content

Certainly, let’s proceed with the changes in the ‘main.tf’ file accordingly .


 
Fig 19 : main.tf changed

Now, it’s time to apply the changes :

terraform apply 

and this is what we get :


 
Fig 20 :

step 6 : the last step is to destroy the infrastructure

terraform destroy

 

the result :


 

IV. Conclusion :

As I said , being a data engineer implies having a lot of skills , it is not only about building pipelines , you have to take care about the whole system , dealing with the networking resources , and storage layer and different aspects of resources .

Authorname Chiheb Mhamdi