Kafka JDBC Connector Practice / Orange Pi 5 Max Cluster Environment
Perform PostgreSQL table replication using Kafka JDBC Connector.
1. Practice Environment Setup
1.1. Overall Practice Environment
![[Figure 1] Kafka Connect JDBC Connector Practice Environment](/blog-software/docs/record/kafka-jdbc-connector-orangepi5-cluster/images/environment.png)
[Figure 1] Kafka Connect JDBC Connector Practice Environment
The environment for transforming data stored in MinIO through Spark is as shown in [Figure 1].
PostgreSQL : Performs the role of data storage.
- kafka_connect_src Database, Users Table : Source table for retrieving data.
- kafka_connect_dst Database, Users Table : Destination table for storing retrieved data.
Kafka Connect : Performs the role of exchanging data between Kafka and PostgreSQL.
- postgresql-src-connector Source JDBC Connector : JDBC connector that sends data from source table to Kafka.
- postgresql-dst-connector Destination JDBC Connector : JDBC connector that stores data retrieved from Kafka into destination table.
Kafka : Performs the role of exchanging data through JDBC connectors. Also performs the role of storing Kafka Connect’s work status.
- postgresql-users Topic : Topic for storing replicated data.
- connect-cluster-configs : Topic for storing Kafka Connect’s configuration information.
- connect-cluster-offsets : Topic for storing Kafka Connect’s offset information.
- connect-cluster-status : Topic for storing Kafka Connect’s status information.
Strimzi Kafka Operator : Operator for managing Kafka and Kafka Connect.
Refer to the following links for the overall practice environment setup.
- Orange Pi 5 Max based Kubernetes Cluster Construction : https://ssup2.github.io/blog-software/docs/record/orangepi5-cluster-build/
- Orange Pi 5 Max based Kubernetes Data Platform Construction : https://ssup2.github.io/blog-software/docs/record/kubernetes-data-platform-orangepi5-cluster/
1.2. Kafka Connect JDBC Connector Image Creation
| |
docker build -t ghcr.io/ssup2-playground/k8s-data-platform_kafka-connect-jdbc-connector:0.48.0-kafka-4.1.0 .
docker push ghcr.io/ssup2-playground/k8s-data-platform_kafka-connect-jdbc-connector:0.48.0-kafka-4.1.0Build and push a container image containing Kafka JDBC Connector using the Dockerfile in [File 1].
1.3. PostgreSQL Configuration
# Create Source Database
kubectl exec -it postgresql-0 -n postgresql -- psql -U postgres -c "CREATE DATABASE kafka_connect_src;"
# Create Destination Database
kubectl exec -it postgresql-0 -n postgresql -- psql -U postgres -c "CREATE DATABASE kafka_connect_dst;"
# Create users Table in Source Database
kubectl exec -it postgresql-0 -n postgresql -- psql -U postgres -d kafka_connect_src -c "
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) NOT NULL,
age INTEGER
);"Create kafka_connect_src source database and kafka_connect_dst destination database in PostgreSQL, and create a users table in the kafka_connect_src source database.
2. Kafka, Kafka Connect Configuration
2.1. Kafka Configuration
| |
kubectl apply -f kafka.yamlApply the Kafka Manifest in [File 2] to have the Strimzi Kafka Operator configure Kafka.
2.1. Kafka Connect Configuration
| |
kubectl apply -f kafka-connect.yamlApply the Kafka Connect Manifest in [File 3] to have the Strimzi Kafka Operator configure Kafka Connect. Version 4.1.0 is specified, and connect-cluster Group ID is used. Kafka Connect’s work status is stored through connect-cluster-offsets, connect-cluster-configs, and connect-cluster-status topics.
2.3. Kafka Connect JDBC Connector Configuration
| |
kubectl apply -f postgresql-src-connector.yamlApply the Kafka Connector Manifest in [File 4] to have Kafka Connect send data from the users table in the kafka_connect_src source database to Kafka.
| |
kubectl apply -f postgresql-dst-connector.yamlApply the Kafka Connector Manifest in [File 5] to have Kafka Connect store data retrieved from the postgresql-users topic in Kafka into the users table in the kafka_connect_dst destination database.
2.5. Data Replication Practice
kubectl exec -it postgresql-0 -n postgresql -- psql -U postgres -d kafka_connect_src -c "
INSERT INTO users (name, email, age) VALUES
('John Doe', 'john@ssup2.com', 30),
('Jane Smith', 'jane@ssup2.com', 25),
('Bob Johnson', 'bob@ssup2.com', 35),
('Alice Brown', 'alice@ssup2.com', 28),
('Charlie Wilson', 'charlie@ssup2.com', 32);"Add data to the users table in the kafka_connect_src source database.
kubectl exec -it postgresql-0 -n postgresql -- psql -U postgres -d kafka_connect_dst -c "SELECT * FROM users;"id | name | email | age
----+----------------+-------------------+-----
1 | John Doe | john@ssup2.com | 30
2 | Jane Smith | jane@ssup2.com | 25
3 | Bob Johnson | bob@ssup2.com | 35
4 | Alice Brown | alice@ssup2.com | 28
5 | Charlie Wilson | charlie@ssup2.com | 32Check if data has been stored in the users table in the kafka_connect_dst destination database.
3. References
- Strimzi Kafka Operator : https://togomi.tistory.com/66