CDC with Debezium Kafka connector ( step by step & implementation from scratch )

Blog - Einleitung

Ancud Blog

Herzlich Willkommen zum Ancud Blog. Hier finden Sie eine Vielzahl von interessanten Artikeln zu verschiedenen Themen. Tauchen Sie ein in unsere Welt des Wissens!

Blogs

Streaming data architectures

if you are a data engineer and you are handling a lot of databases , Debezium is the right tool to trigger the changes and the transactions . In this article I’m going to explain step by step how to do it , and guess what folks , you can try it because it is going to be deployed on your local environment .

Within our company, Ancud IT, it’s our daily business to help our customers for building highly scalable data infrastructure to get the maximum value out of the data. But which technologies to use and how to start?

To answer this questions, we want to share our experience with you.

before heading deeper into the tutorial , let’s define CDC and where does it fit in modern projects .

Definition : Change data capture refers to the process of identifying and capturing changes as they are made in a database or source application, then delivering those changes in real time to a downstream process, system, or data lake. [ 1 ]
Usage : CDC is widely implemented in systems which needs to be synchronized and ready for cloud migrations with zero-downtime .

CDC is perfectly the best solution to move data across different networks and systems in real time , and since it ensures this great feature , it supports processing and analytics for this real-time data . let me explain this further with this architecture :

Figure 1 : Streaming data architectures

in the above architecture I tried to break down event & streaming architectures and I will point out the things to remember as engineers :

CDC will capture the changes in the data sources , those changes will be submitted using different platforms like Apache Kafka , this latter will transfer the data to a processing engine like spark for example will be stored in streaming engine like Confluent KSQL , in the end the data will used by real time analytics like Apache Flink, StreamingSQL,…

Architecture of the concept:

Figure 2 : Architecture of the concept

Requirements :

Only docker desktop installed in your machine

Steps briefly :

1 : start the zookeeper container .
2 : start Kafka server container .
3 : start MySQL database from which Debezium can capture .
4 : start MySQL client to connect to the Database container that we’ve already launched .
5 : Start Kafka connect and link it to MySQL database and Kafka .

note : each step should be executed in a separate terminal

enough talking ! let’s start the tutorial :

The first step would be to start zookeeper , but let’s understand first of all zookeeper .

zookeeper : top-level software developed by Apache , used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems [2] .

docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper:2.0

DON’T WORRY , I GOT YOU , YOU WANNA UNDERSTAND THE ARGUMENTS , SURE NO PROBLEM .

Blog - Einleitung

Ancud Blog

Blogs

CDC with Debezium Kafka connector ( step by step & implementation from scratch )

Weitere Blogeintrag

Datenaufbereitung für das Finetuning von Large Language Models (LLMs) mit H2O.ai LLM DataStudio

Data Engineers : I’m here for you ...

Navigationsmenü