Kubernetes Data Platform 구축 / Raspberry Pi 5, Jetson Nano Cluster 환경

Kubernetes Data Platform 구축 / Raspberry Pi 5, Jetson Nano Cluster 환경

1. 설치 환경

[Figure 1] Cluster Spec

[Figure 1] Cluster Spec

[Figure 2] Cluster 구성 요소

[Figure 2] Cluster 구성 요소

2. OS 설치

2.1. Raspberry Pi 5

[Figure 3] Raspberry Pi Imager

[Figure 3] Raspberry Pi Imager

Raspberry Pi Imager를 활용하여 Ubuntu Server 24.04를 설치한다.

[Figure 4] Raspberry Pi Imager General

[Figure 4] Raspberry Pi Imager General

[Figure 5] Raspberry Pi Imager Services

[Figure 5] Raspberry Pi Imager Services

Host 이름, Username, Password, Timezone, SSH Server를 설정하고 OS Image를 uSD Card에 복사한다.

  • Host 이름 : [Figure 1]의 Host 이름 참조
  • Username/Password : temp/temp

2.2. Jetson Nano

Install Guide에 따라서 uSD Card에 OS를 설치한다.

3. Network 설정

3.1. Raspberry Pi 5

root User로 진입한다.

sudo -s

[Figure 1]의 Network를 참조하여 /etc/netplan/50-cloud-init.yaml 파일에 다음과 같이 고정 IP를 설정한다.

network:
    ethernets:
        eth0:
          addresses:
            - [IP Address]/24
          nameservers:
            addresses:
              - 8.8.8.8
          routes:
            - to: default
              via: 192.168.1.1
    version: 2
[File 1] /etc/netplan/50-cloud-init.yaml

3.2. Jetson Nano

root User로 진입한다.

sudo -s

[Figure 1]의 Network를 참조하여 고정 IP를 설정한다.

nmcli con mod "Wired connection 1" \
  ipv4.addresses "[IP Address]/24" \
  ipv4.gateway "192.168.1.1" \
  ipv4.dns "8.8.8.8" \
  ipv4.method "manual"

4. Docker, kubelet 설치

4.1. Raspberry Pi 5

Kernel Module을 로드한다.

cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

sysctl Parameter를 설정한다.

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sysctl --system

containerd를 설치한다.

apt update
apt install -y containerd
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
systemctl restart containerd.service

kubelet, kubeadm을 설치한다.

apt-get update
apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet=1.30.8-1.1 kubeadm=1.30.8-1.1

4.2. Jetson Nano

Swap Memory를 제거한다.

swapoff -a
mv /etc/systemd/nvzramconfig.sh /etc/systemd/nvzramconfig.sh.back

Kernel Module을 로드한다.

cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

sysctl Parameter를 설정한다.

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sysctl --system

containerd를 설치한다.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit containerd
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
nvidia-ctk runtime configure --runtime=containerd --set-as-default
systemctl restart containerd

kubelet, kubeadm을 설치한다.

apt-get update
mkdir -p /etc/apt/keyrings
apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet=1.30.8-1.1 kubeadm=1.30.8-1.1

5. Kubernetes Cluster 구성

5.1. Master Node (Raspberry Pi 5)

kubectl을 설치한다.

apt-get install -y kubectl=1.30.8-1.1

Kubernetes Cluster를 구성한다.

cat <<EOF | tee kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
certificateValidityPeriod: 876000h
caCertificateValidityPeriod: 876000h
kubernetesVersion: "v1.30.8"
networking:
  podSubnet: "10.244.0.0/16"
EOF

kubeadm init --config kubeadm-config.yaml
kubeadm join 192.168.1.71:6443 --token e5t05s.1z4zbpm3oxdhskya --discovery-token-ca-cert-hash sha256:01c2bf6ead65ea0e9c39186d92a51baa9aa6dc6963b900cd825d7e14dcb08fba

kubectl config 파일을 복사한다.

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

flannel CNI Plugin을 설치한다.

kubectl apply -f https://github.com/flannel-io/flannel/releases/download/v0.26.2/kube-flannel.yml

5.2. Compute, Storage, GPU Nodes

각각의 Node에 SSH로 접근하여 Kubernetes Cluster에 Join 한다.

kubeadm join 192.168.1.71:6443 --token e5t05s.1z4zbpm3oxdhskya --discovery-token-ca-cert-hash sha256:01c2bf6ead65ea0e9c39186d92a51baa9aa6dc6963b900cd825d7e14dcb08fba

Master Node의 Master Label과 Taint를 제거한다.

kubectl taint node dp-master node-role.kubernetes.io/control-plane:NoSchedule-
kubectl label node dp-master node-role.kubernetes.io/control-plane-

각각의 Node에 Role을 부여한다.

kubectl label node dp-master node-role.kubernetes.io/master=""
kubectl label node dp-compute-01 node-role.kubernetes.io/compute=""
kubectl label node dp-compute-02 node-role.kubernetes.io/compute=""
kubectl label node dp-storage-01 node-role.kubernetes.io/storage=""
kubectl label node dp-storage-02 node-role.kubernetes.io/storage=""
kubectl label node dp-gpu-01 node-role.kubernetes.io/gpu=""
kubectl label node dp-gpu-02 node-role.kubernetes.io/gpu=""
kubectl label node dp-master node-group.dp.ssup2="master"
kubectl label node dp-compute-01 node-group.dp.ssup2="compute"
kubectl label node dp-compute-02 node-group.dp.ssup2="compute"
kubectl label node dp-storage-01 node-group.dp.ssup2="storage"
kubectl label node dp-storage-02 node-group.dp.ssup2="storage"
kubectl label node dp-gpu-01 node-group.dp.ssup2="gpu"
kubectl label node dp-gpu-02 node-group.dp.ssup2="gpu"

Node의 Role을 확인한다.

kubectl get nodes
NAME            STATUS   ROLES                  AGE   VERSION
dp-compute-01   Ready    compute                16d   v1.30.8
dp-compute-02   Ready    compute                16d   v1.30.8
dp-gpu-01       Ready    gpu                    16d   v1.30.8
dp-gpu-02       Ready    gpu                    16d   v1.30.8
dp-master       Ready    control-plane,master   16d   v1.30.8
dp-storage-01   Ready    storage                16d   v1.30.8
dp-storage-02   Ready    storage                16d   v1.30.8

core-dns가 Master Node에만 동작하도록 설정한다.

kubectl patch deployment coredns -n kube-system -p '{"spec":{"template":{"spec":{"nodeSelector":{"node-group.dp.ssup2":"master"}}}}}'

6. Data Component 설치

# MetelLB
helm upgrade --install --create-namespace --namespace metallb metallb metallb -f metallb/values.yaml
kubectl apply -f metallb/ip-address-pool.yaml
kubectl apply -f metallb/l2-advertisement.yaml

# Cert Manager
helm upgrade --install --create-namespace --namespace cert-manager cert-manager cert-manager -f cert-manager/values.yaml

# PostgreSQL (ID/PW: postgres/root123!)
helm upgrade --install --create-namespace --namespace postgresql postgresql postgresql -f postgresql/values.yaml
kubectl -n postgresql exec -it postgresql-0 -- bash -c 'PGPASSWORD=root123! psql -U postgres -c "create database dagster;"'
kubectl -n postgresql exec -it postgresql-0 -- bash -c 'PGPASSWORD=root123! psql -U postgres -c "create database metastore;"'
kubectl -n postgresql exec -it postgresql-0 -- bash -c 'PGPASSWORD=root123! psql -U postgres -c "create database ranger;"'
kubectl -n postgresql exec -it postgresql-0 -- bash -c 'PGPASSWORD=root123! psql -U postgres -c "create database mlflow;"'
kubectl -n postgresql exec -it postgresql-0 -- bash -c 'PGPASSWORD=root123! psql -U postgres -c "create database mlflow_auth;"'

# Redis (ID/PW: default/default)
helm upgrade --install --create-namespace --namespace redis redis redis -f redis/values.yaml

# ArgoCD (ID/PW: default/default)
helm upgrade --install --create-namespace --namespace argo-cd argo-cd argo-cd -f argo-cd/values.yaml

# Yunikorn
helm upgrade --install --create-namespace --namespace yunikorn yunikorn yunikorn -f yunikorn/values.yaml

# KEDA
helm upgrade --install --create-namespace --namespace keda keda keda -f keda/values.yaml

# Longhorn
helm upgrade --install --create-namespace --namespace longhorn longhorn longhorn -f longhorn/values.yaml

# MinIO (ID/PW: root/root123!)
helm upgrade --install --create-namespace --namespace minio minio minio -f minio/values.yaml
brew install minio/stable/mc
mc alias set dp http://$(kubectl -n minio get service minio -o jsonpath="{.status.loadBalancer.ingress[0].ip}"):9000 root root123!
mc mb dp/spark/logs
mc mb dp/dagster/pipelines


# ZincSearch (ID/PW: admin/Rootroot123!)
helm upgrade --install --create-namespace --namespace zincsearch zincsearch zincsearch -f zincsearch/values.yaml

INDEX_MAPPING='{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      },
      "description": {
        "type": "text"
      },
      "created_at": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}'
curl -s -X PUT "http://$(kubectl -n zincsearch get service zincsearch -o jsonpath='{.status.loadBalancer.ingress[0].ip}'):4080/ranger" -u "admin:Rootroot123\!" -H "Content-Type: application/json" -d "$INDEX_MAPPING"

# Nvidia Device Plugin
helm upgrade --install --create-namespace --namespace nvidia-device-plugin nvidia-device-plugin nvidia-device-plugin -f nvidia-device-plugin/values.yaml

# Nvidia Jetson Exporter

# Prometheus
helm upgrade --install --create-namespace --namespace prometheus prometheus prometheus -f prometheus/values.yaml

# Prometheus Node Exporter
helm upgrade --install --create-namespace --namespace prometheus-node-exporter prometheus-node-exporter prometheus-node-exporter -f prometheus-node-exporter/values.yaml

# kube-state-metrics
helm upgrade --install --create-namespace --namespace kube-state-metrics kube-state-metrics kube-state-metrics -f kube-state-metrics/values.yaml

# Loki
helm upgrade --install --create-namespace --namespace loki loki loki -f loki/values.yaml

# Promtail
helm upgrade --install --create-namespace --namespace promtail promtail promtail -f promtail/values.yaml

# Grafana (ID/PW: admin/root123!)
helm upgrade --install --create-namespace --namespace grafana grafana grafana -f grafana/values.yaml

# Ranger
helm upgrade --install --create-namespace --namespace ranger ranger ranger -f ranger/values.yaml

# Hive Metastore
helm upgrade --install --create-namespace --namespace hive-metastore hive-metastore hive-metastore -f hive-metastore/values.yaml

# Kafka (ID/PW: user/user)
helm upgrade --install --create-namespace --namespace kafka kafka kafka -f kafka/values.yaml
helm upgrade --install --create-namespace --namespace kafka kafka-ui kafka-ui -f kafka-ui/values.yaml

# Airflow (ID/PW: admin/admin)
helm upgrade --install --create-namespace --namespace airflow airflow airflow -f airflow/values.yaml

# Dagster
helm upgrade --install --create-namespace --namespace dagster dagster dagster -f dagster/values.yaml

# Spark Operator
helm upgrade --install --create-namespace --namespace spark-operator spark-operator spark-operator -f spark-operator/values.yaml

# Spark History Server
helm upgrade --install --create-namespace --namespace spark-history-server spark-history-server spark-history-server -f spark-history-server/values.yaml

# Flink Kubernetes Operator
helm upgrade --install --create-namespace --namespace flink-kubernetes-operator flink-kubernetes-operator flink-kubernetes-operator -f flink-kubernetes-operator/values.yaml

# Trino (ID: root)
helm upgrade --install --create-namespace --namespace trino trino trino -f trino/values.yaml

# JupyterHub (ID/PW: root/root123!)
helm upgrade --install --create-namespace --namespace jupyterhub jupyterhub jupyterhub -f jupyterhub/values.yaml

# Apache Ranger

# MLflow (ID/PW: root/root123!)
helm upgrade --install --create-namespace --namespace mlflow mlflow mlflow -f mlflow/values.yaml

참조