Ubuntu24.04kubeadm方式部署v1.33集群
Ubuntu24.04 kubeadm方式部署 v1.33 集群
- 文档采用离线方式安装(可使用在线安装命令在线安装软件)。
- 文档适用于高可用和非高可用集群部署。
- 高可用集群部署时,如果不特别说明,表示仅在 controller01 节点操作。
1.系统优化
1.1 配置 hosts 解析
所有节点操作:
cat >> /etc/hosts <<-'EOF'
192.168.109.118 apiserver-lb
192.168.109.111 k8s-controller01
192.168.109.112 k8s-controller02
192.168.109.113 k8s-controller03
192.168.109.114 k8s-worker01
192.168.109.115 k8s-worker02
EOF
如果不是高可用集群,使用以下操作:
cat >> /etc/hosts <<-'EOF'
192.168.109.111 k8s-controller
192.168.109.114 k8s-worker01
192.168.109.115 k8s-worker02
EOF
1.2 配置免密登录并同步文件
上传压缩包解压并安装常用软件:
tar xf software-kubeadm.tar.gz
mv software / && cd /software
tar xf dependence.tar.gz && cd dependence/
./install.sh
执行免密登录和同步文件脚本:
cd /software/ && ./setup_ssh_and_distribute-ha.sh
如果不是高可用集群,使用以下操作:
cd /software/ && ./setup_ssh_and_distribute.sh
1.3 安装常用的软件包
所有节点(除 controller01 节点外,非高可用安装时为 worker 节点)安装:
cd /software
tar xf dependence.tar.gz && cd dependence/
./install.sh
在线安装命令(所有节点):
apt -y install bind9-utils expect rsync jq psmisc net-tools lvm2 vim unzip rename
1.4 关闭 ufw
所有节点操作:
systemctl disable --now ufw
1.5 关闭swap分区
所有节点操作:
swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
free -h
1.6 修改时区
所有节点操作:
ln -svf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
date -R
1.7 配置limit
所有节点操作:
ulimit -SHn 65535
cat >> /etc/security/limits.conf <<-'EOF'
* soft nofile 655360
* hard nofile 131072
* soft nproc 655350
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
EOF
1.8 优化sshd服务
所有节点操作:
sed -i 's@#UseDNS yes@UseDNS no@g' /etc/ssh/sshd_config
sed -i 's@^GSSAPIAuthentication yes@GSSAPIAuthentication no@g' /etc/ssh/sshd_config
1.9 Linux内核调优
所有节点操作:
cat > /etc/sysctl.d/k8s.conf <<-'EOF'
# 以下3个参数是containerd所依赖的内核参数
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv6.conf.all.disable_ipv6 = 1
fs.may_detach_mounts = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384
EOF
sysctl --system
1.10 安装并配置ipvsadm
所有节点操作:
cd /software
tar xf ipvsadm.tar.gz && cd ipvsadm/
./install.sh
在线安装命令:
apt -y install ipvsadm ipset sysstat conntrack libseccomp
所有节点操作:
cat > /etc/modules-load.d/ipvs.conf <<-'EOF'
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
EOF
重启所有节点并验证:
lsmod | grep --color=auto -e ip_vs -e nf_conntrack
2.containerd 部署
2.1 安装软件
所有节点操作:
cd /software
tar xf containerd.tar.gz && cd containerd/
./install.sh
在线安装命令:
# step 1: 安装必要的一些系统工具
apt-get update
apt-get install ca-certificates curl gnupg
# step 2: 信任 Docker 的 GPG 公钥
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
# Step 3: 写入软件源信息
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Step 4: 更新软件源并安装 Containerd
apt-get update
apt-get -y install containerd.io
2.2 模块配置
所有节点操作:
modprobe -- overlay
modprobe -- br_netfilter
cat > /etc/modules-load.d/containerd.conf <<-'EOF'
overlay
br_netfilter
EOF
2.3 修改配置文件
所有节点操作:
containerd config default | tee /etc/containerd/config.toml
sed -ri 's#(SystemdCgroup = )false#\1true#' /etc/containerd/config.toml
grep SystemdCgroup /etc/containerd/config.toml
sed -i 's#registry.k8s.io#registry.cn-hangzhou.aliyuncs.com/google_containers#' /etc/containerd/config.toml
grep sandbox_image /etc/containerd/config.toml
2.4 启动服务
所有节点操作:
systemctl daemon-reload
systemctl enable --now containerd
systemctl status containerd
2.5 配置客户端连接地址
所有节点操作:
cat > /etc/crictl.yaml <<-'EOF'
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
systemctl restart containerd
ctr version
2.6 拉取镜像测试
所有节点操作:
ctr -n k8s.io i pull docker.m.daocloud.io/alpine:latest
ctr -n k8s.io i ls|grep alpine
ctr -n k8s.io i rm docker.m.daocloud.io/alpine:latest
ctr -n k8s.io i ls|grep alpine
3.高可用组件安装
如果不是高可用集群,跳过此步骤。
3.1 安装软件
所有 controller 节点操作:
cd /software/
tar xf ha.tar.gz && cd ha
./install.sh
在线安装命令:
apt -y install keepalived haproxy
3.2 haproxy 软件配置
所有 controller 节点操作:
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak
cat >/etc/haproxy/haproxy.cfg<<"EOF"
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s
defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s
frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor
frontend k8s-controller
bind 0.0.0.0:8443
bind 127.0.0.1:8443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-controller
backend k8s-controller
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server k8s-controller01 192.168.109.111:6443 check
server k8s-controller02 192.168.109.112:6443 check
server k8s-controller03 192.168.109.113:6443 check
EOF
3.3 keepalived 软件配置
controller01 节点:
cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state controller
# 注意网卡名
interface ens33
mcast_src_ip 192.168.109.111
virtual_router_id 51
priority 100
nopreempt
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.109.118
}
track_script {
chk_apiserver
} }
EOF
controller02 节点:
cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
# 注意网卡名
interface ens33
mcast_src_ip 192.168.109.112
virtual_router_id 51
priority 80
nopreempt
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.109.118
}
track_script {
chk_apiserver
} }
EOF
controller03 节点:
cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
# 注意网卡名
interface ens33
mcast_src_ip 192.168.109.113
virtual_router_id 51
priority 50
nopreempt
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.109.118
}
track_script {
chk_apiserver
} }
EOF
3.4 健康检查脚本
所有 controller 节点操作:
cat > /etc/keepalived/check_apiserver.sh << EOF
#!/bin/bash
err=0
for k in \$(seq 1 3)
do
check_code=\$(pgrep haproxy)
if [[ \$check_code == "" ]]; then
err=\$(expr \$err + 1)
sleep 1
continue
else
err=0
break
fi
done
if [[ \$err != "0" ]]; then
echo "systemctl stop keepalived"
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
EOF
chmod +x /etc/keepalived/check_apiserver.sh
3.5 启动 haproxy 和 keepalived 服务
systemctl daemon-reload
systemctl enable --now haproxy.service
systemctl enable --now keepalived.service
systemctl restart haproxy.service
systemctl status haproxy.service
systemctl status keepalived.service
3.6 高可用测试
ping -c 4 192.168.109.118
telnet 192.168.109.118 8443
4.集群部署
4.1 安装 k8s 软件
所有节点安装 kubeadm,kubelet,kubectl:
cd /software
tar xf k8s-software.tar.gz && cd k8s-software
./install.sh
systemctl enable --now kubelet.service
在线安装命令(使用清华源):
# 导入 gpg key
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 编写仓库文件
cat > /etc/apt/sources.list.d/kubernetes.list <<EOF
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/core:/stable:/v1.28/deb/ /
# deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/addons:/cri-o:/stable:/v1.28/deb/ /
EOF
# 更新并安装软件
apt update
apt install -y kubeadm kubectl kubelet
所有节点查看软件版本:
kubeadm version
kubectl version
kubelet --version
4.2 导入所需镜像
如果是在线安装且网络良好(网络不好可以在初始化集群时指定国内镜像地址),可以不用提前导入镜像。
编写拷贝脚本:
mkdir /script
cat > /script/copy_file.sh <<-'EOF'
#!/bin/bash
# 通用文件分发脚本
# 用法: /script/copy_kubeconfig_certificate.sh [w|c|a] /path/to/file
# 节点分组
CONTROLLERS=("k8s-controller02" "k8s-controller03")
WORKERS=("k8s-worker01" "k8s-worker02")
ALL_NODES=("${CONTROLLERS[@]}" "${WORKERS[@]}")
# 参数检查
if [[ $# -ne 2 ]]; then
echo "用法: $0 [w|c|a] /path/to/file"
exit 1
fi
TARGET_GROUP=$1
SRC_FILE=$2
if [[ ! -f "$SRC_FILE" ]]; then
echo "❌ 本地文件不存在: $SRC_FILE"
exit 1
fi
# 目标节点选择
case "$TARGET_GROUP" in
w)
NODES=("${WORKERS[@]}")
;;
c)
NODES=("${CONTROLLERS[@]}")
;;
a)
NODES=("${ALL_NODES[@]}")
;;
*)
echo "❌ 无效参数: $TARGET_GROUP (必须是 w|c|a)"
exit 1
;;
esac
# 提取目录和文件名
DST_DIR=$(dirname "$SRC_FILE")
FILENAME=$(basename "$SRC_FILE")
# 统计
SUCCESS_COUNT=0
FAIL_COUNT=0
FAILED_NODES=()
echo "开始分发文件: $SRC_FILE"
echo "目标节点组: $TARGET_GROUP"
# 循环节点
for node in "${NODES[@]}"; do
echo ">>> 处理节点: $node"
NODE_SUCCESS=true
# 确保目标目录存在
ssh "$node" "mkdir -p $DST_DIR"
if [[ $? -ne 0 ]]; then
echo " [ERROR] 无法在 $node 上创建目录: $DST_DIR"
NODE_SUCCESS=false
else
# 传输文件
scp -q "$SRC_FILE" "$node:$DST_DIR/"
if [[ $? -eq 0 ]]; then
echo " [OK] $FILENAME 已传输到 $node:$DST_DIR"
else
echo " [ERROR] $FILENAME 传输到 $node 失败"
NODE_SUCCESS=false
fi
fi
# 节点统计
if $NODE_SUCCESS; then
((SUCCESS_COUNT++))
else
((FAIL_COUNT++))
FAILED_NODES+=("$node")
fi
done
# === 总结 ===
echo "======================"
echo "分发完成"
echo "成功节点数量: $SUCCESS_COUNT"
echo "失败节点数量: $FAIL_COUNT"
if [[ $FAIL_COUNT -gt 0 ]]; then
echo "失败节点列表: ${FAILED_NODES[*]}"
fi
echo "======================"
EOF
chmod +x /script/copy_file.sh
拷贝镜像包到各节点:
/script/copy_file.sh c /software/controller_images.tar && \
/script/copy_file.sh w /software/worker_images.tar
如果不是高可用集群,使用以下操作:
/script/copy_file.sh w /software/worker_images.tar
所有 controller 节点导入镜像:
ctr -n k8s.io i import /software/controller_images.tar
所有 worker 节点导入镜像:
ctr -n k8s.io i import /software/worker_images.tar
4.3 初始化集群并加入节点
4.3.1 初始化集群
使用以下命令查看默认配置:
kubeadm config print init-defaults
指定初始化配置文件初始化集群(如果在线部署且网络不好,可以将 imageRepository 字段改为国内源地址,如阿里源地址:registry.aliyuncs.com/google_containers):
# 高可用集群初始化
cd /software
kubeadm init --config=kubeadm-ha.yaml --upload-certs
# 非高可用集群初始化
cd /software
kubeadm init --config=kubeadm.yaml --upload-certs
kubeadm-ha.yaml 文件内容如下:
apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: 690f18.ec60b9557b7da447
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.109.111
bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.111"
name: k8s-controller01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
---
apiServer:
certSANs:
- k8s-controller01
- k8s-controller02
- k8s-controller03
- 192.168.109.111
- 192.168.109.112
- 192.168.109.113
- 192.168.109.114
- 192.168.109.115
- 192.168.109.116
- 192.168.109.117
- 127.0.0.1
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s
certificateValidityPeriod: 8760h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.33.4
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/12
serviceSubnet: 10.96.0.0/16
proxy: {}
scheduler: {}
controlPlaneEndpoint: "192.168.109.118:8443"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
cgroupDriver: systemd
logging: {}
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
kubeadm.yaml 文件内容如下:
apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: 690f18.ec60b9557b7da447
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.109.111
bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.111"
name: k8s-controller
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
---
apiServer:
certSANs:
- k8s-controller
- 192.168.109.111
- 192.168.109.114
- 192.168.109.115
- 192.168.109.116
- 192.168.109.117
- 127.0.0.1
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s
certificateValidityPeriod: 8760h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.33.4
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/12
serviceSubnet: 10.96.0.0/16
proxy: {}
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
cgroupDriver: systemd
logging: {}
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
参数说明:
-
–config:指定 kubeadm 配置文件。
-
–upload-certs:将证书文件上传到 kubeadm-certs,以 Secret 方式存储,其他控制节点在加入集群时可以自动下载证书。如果不指定此参数,需要在初始化集群后拷贝证书到其他控制节点:
cat > /script/copy_certificate.sh <<-'EOF'
#!/bin/bash
# k8s 证书分发脚本(只拷贝特定证书文件)
# 控制节点(证书)
NODES=("k8s-controller02" "k8s-controller03")
# 证书目录
SRC_CERT_DIR="/etc/kubernetes/pki"
DST_CERT_DIR="/etc/kubernetes/pki"
CERT_FILES=(
"$SRC_CERT_DIR/ca.crt"
"$SRC_CERT_DIR/ca.key"
"$SRC_CERT_DIR/sa.key"
"$SRC_CERT_DIR/sa.pub"
"$SRC_CERT_DIR/front-proxy-ca.crt"
"$SRC_CERT_DIR/front-proxy-ca.key"
"$SRC_CERT_DIR/etcd/ca.crt"
"$SRC_CERT_DIR/etcd/ca.key"
)
# 统计
SUCCESS_COUNT=0
FAIL_COUNT=0
FAILED_NODES=()
echo "开始分发证书文件..."
# === 控制节点分发 ===
for node in "${NODES[@]}"; do
echo ">>> 处理控制节点: $node"
NODE_SUCCESS=true
# === 分发证书文件 ===
ssh "$node" "mkdir -p $DST_CERT_DIR/etcd"
for file in "${CERT_FILES[@]}"; do
if [[ -f "$file" ]]; then
# 判断是否为 etch 目录下的文件
if [[ "$file" == *"/etcd/"* ]]; then
# 如果是 etcd 目录下的文件,拷贝到目标的 /etc/kubernetes/pki/etcd 目录
scp -q "$file" "$node:$DST_CERT_DIR/etcd/$(basename "$file")"
echo " [OK] etcd/$(basename "$file") 已传输到 $node:$DST_CERT_DIR/etcd"
else
# 如果不是 etcd 目录下的文件,拷贝到目标的 /etc/kubernetes/pki 目录
scp -q "$file" "$node:$DST_CERT_DIR/$(basename "$file")"
echo " [OK] $(basename "$file") 已传输到 $node:$DST_CERT_DIR"
fi
if [[ $? -ne 0 ]]; then
echo " [ERROR] $(basename "$file") 传输到 $node 失败"
NODE_SUCCESS=false
fi
else
echo " [WARN] 本地缺少文件: $file"
NODE_SUCCESS=false
fi
done
# 节点统计
if $NODE_SUCCESS; then
((SUCCESS_COUNT++))
else
((FAIL_COUNT++))
FAILED_NODES+=("$node")
fi
done
# === 总结 ===
echo "======================"
echo "分发完成"
echo "成功节点数量: $SUCCESS_COUNT"
echo "失败节点数量: $FAIL_COUNT"
if [[ $FAIL_COUNT -gt 0 ]]; then
echo "失败节点列表: ${FAILED_NODES[*]}"
fi
echo "======================"
EOF
chmod +x /script/copy_certificate.sh
/script/copy_certificate.sh
初始化成功后输出信息:
# 高可用集群输出信息
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes running the following command on each as root:
kubeadm join 192.168.109.118:8443 --token 690f18.ec60b9557b7da447 \
--discovery-token-ca-cert-hash sha256:39a70eb8a4fa00f25311b4786b5b09cc2f9814d9eb787386f10fc3ca52abc755 \
--control-plane --certificate-key 2f3df0f8ad1163cb5b16ad32d89548e49c1967094cc8ee0a0a590424f05968f2
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.109.118:8443 --token 690f18.ec60b9557b7da447 \
--discovery-token-ca-cert-hash sha256:39a70eb8a4fa00f25311b4786b5b09cc2f9814d9eb787386f10fc3ca52abc755
# 非高可用集群输出信息
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.109.111:6443 --token 690f18.ec60b9557b7da447 \
--discovery-token-ca-cert-hash sha256:ff7e0992102cb0c5ef1036f4bdff0032ed181723d29b3d0d55712e77e13cd2c4
重新初始化执行以下命令:
kubeadm reset
4.3.2 命令行方式加入集群(推荐)
请使用自己初始化集群完成后中的输出信息。
controller02-03 节点加入集群:
kubeadm join 192.168.109.118:8443 --token 690f18.ec60b9557b7da447 \
--discovery-token-ca-cert-hash sha256:39a70eb8a4fa00f25311b4786b5b09cc2f9814d9eb787386f10fc3ca52abc755 \
--control-plane --certificate-key 2f3df0f8ad1163cb5b16ad32d89548e49c1967094cc8ee0a0a590424f05968f2
worker01-02 节点加入集群:
kubeadm join 192.168.109.118:8443 --token 690f18.ec60b9557b7da447 \
--discovery-token-ca-cert-hash sha256:39a70eb8a4fa00f25311b4786b5b09cc2f9814d9eb787386f10fc3ca52abc755
如果不是高可用集群,仅 worker 节点加入:
kubeadm join 192.168.109.111:6443 --token 690f18.ec60b9557b7da447 \
--discovery-token-ca-cert-hash sha256:ff7e0992102cb0c5ef1036f4bdff0032ed181723d29b3d0d55712e77e13cd2c4
4.3.3 配置文件方式加入集群
1.查看加入集群默认配置命令如下:
kubeadm config print join-defaults
2.不管是高可用还是非高可用集群部署,一定要注意配置文件中 caCertHashes 和 certificateKey 选项内容要与集群初始化完成后输出信息中的 --discovery-token-ca-cert-hash 和 --certificate-key 一致。
controller02 节点加入(注意 caCertHashes 和 certificateKey 字段和初始化后的要一致):
cd /software
cat > kubeadm-join-controller02.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.109.118:8443
token: 690f18.ec60b9557b7da447
caCertHashes:
- "sha256:9bbc6899fbab9e6f22a11e01b9ba4c57b469fc96b468c72faab92f3b145bd86e"
unsafeSkipCAVerification: true
tlsBootstrapToken: 690f18.ec60b9557b7da447
kind: JoinConfiguration
controlPlane:
localAPIEndpoint:
advertiseAddress: "192.168.109.112"
bindPort: 6443
certificateKey: "1ada31aa513fdbd7116637880cc24a2bd299b1ec9be0d41b4bda4eb5fe099f84"
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-controller02
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.112"
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
EOF
kubeadm join --config=kubeadm-join-controller02.yaml
controller03 节点加入(注意 caCertHashes 和 certificateKey 字段和初始化后的要一致):
cd /software
cat > kubeadm-join-controller03.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.109.118:8443
token: 690f18.ec60b9557b7da447
caCertHashes:
- "sha256:9bbc6899fbab9e6f22a11e01b9ba4c57b469fc96b468c72faab92f3b145bd86e"
unsafeSkipCAVerification: true
tlsBootstrapToken: 690f18.ec60b9557b7da447
kind: JoinConfiguration
controlPlane:
localAPIEndpoint:
advertiseAddress: "192.168.109.113"
bindPort: 6443
certificateKey: "1ada31aa513fdbd7116637880cc24a2bd299b1ec9be0d41b4bda4eb5fe099f84"
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-controller03
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.113"
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
EOF
kubeadm join --config=kubeadm-join-controller03.yaml
worker01 节点加入(注意 caCertHashes 字段和初始化后的要一致):
cd /software
cat > kubeadm-join-worker01.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.109.118:8443
token: 690f18.ec60b9557b7da447
caCertHashes:
- "sha256:39a70eb8a4fa00f25311b4786b5b09cc2f9814d9eb787386f10fc3ca52abc755"
unsafeSkipCAVerification: true
tlsBootstrapToken: 690f18.ec60b9557b7da447
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-worker01
taints: null
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.114"
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
EOF
kubeadm join --config=kubeadm-join-worker01.yaml
如果不是高可用集群,使用以下操作(注意 caCertHashes 字段和初始化后的要一致):
cd /software
cat > kubeadm-join-worker01.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.109.111:6443
token: 690f18.ec60b9557b7da447
caCertHashes:
- "sha256:ff7e0992102cb0c5ef1036f4bdff0032ed181723d29b3d0d55712e77e13cd2c4"
unsafeSkipCAVerification: true
tlsBootstrapToken: 690f18.ec60b9557b7da447
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-worker01
taints: null
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.114"
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
EOF
kubeadm join --config=kubeadm-join-worker01.yaml
worker02 节点加入(注意 caCertHashes 字段和初始化后的要一致):
cd /software
cat > kubeadm-join-worker02.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.109.118:8443
token: 690f18.ec60b9557b7da447
caCertHashes:
- "sha256:39a70eb8a4fa00f25311b4786b5b09cc2f9814d9eb787386f10fc3ca52abc755"
unsafeSkipCAVerification: true
tlsBootstrapToken: 690f18.ec60b9557b7da447
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-worker02
taints: null
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.115"
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
EOF
kubeadm join --config=kubeadm-join-worker02.yaml
如果不是高可用集群,使用以下操作(注意 caCertHashes 字段和初始化后的要一致):
cd /software
cat > kubeadm-join-worker02.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.109.111:6443
token: 690f18.ec60b9557b7da447
caCertHashes:
- "sha256:ff7e0992102cb0c5ef1036f4bdff0032ed181723d29b3d0d55712e77e13cd2c4"
unsafeSkipCAVerification: true
tlsBootstrapToken: 690f18.ec60b9557b7da447
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-worker02
taints: null
kubeletExtraArgs:
- name: "node-ip"
value: "192.168.109.115"
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
EOF
kubeadm join --config=kubeadm-join-worker02.yaml
4.4 配置 kubectl 环境变量
所有 controller 节点:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
5.配置命令补全
所有 controller 节点操作:
kubectl completion bash > ~/.kube/completion.bash.inc
echo "source '$HOME/.kube/completion.bash.inc'" >> $HOME/.bashrc
source $HOME/.bashrc
6.安装网络插件
6.1 安装 calico 插件
官方安装文档:https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
6.1.1 准备镜像
如果是在线安装且网络良好,可以不用提前导入镜像。
编写拷贝脚本:
cat > /script/copy_file.sh <<-'EOF'
#!/bin/bash
# 通用文件分发脚本
# 用法: /script/copy_kubeconfig_certificate.sh [w|c|a] /path/to/file
# 节点分组
CONTROLLERS=("k8s-controller02" "k8s-controller03")
WORKERS=("k8s-worker01" "k8s-worker02")
ALL_NODES=("${CONTROLLERS[@]}" "${WORKERS[@]}")
# 参数检查
if [[ $# -ne 2 ]]; then
echo "用法: $0 [w|c|a] /path/to/file"
exit 1
fi
TARGET_GROUP=$1
SRC_FILE=$2
if [[ ! -f "$SRC_FILE" ]]; then
echo "❌ 本地文件不存在: $SRC_FILE"
exit 1
fi
# 目标节点选择
case "$TARGET_GROUP" in
w)
NODES=("${WORKERS[@]}")
;;
c)
NODES=("${CONTROLLERS[@]}")
;;
a)
NODES=("${ALL_NODES[@]}")
;;
*)
echo "❌ 无效参数: $TARGET_GROUP (必须是 w|c|a)"
exit 1
;;
esac
# 提取目录和文件名
DST_DIR=$(dirname "$SRC_FILE")
FILENAME=$(basename "$SRC_FILE")
# 统计
SUCCESS_COUNT=0
FAIL_COUNT=0
FAILED_NODES=()
echo "开始分发文件: $SRC_FILE"
echo "目标节点组: $TARGET_GROUP"
# 循环节点
for node in "${NODES[@]}"; do
echo ">>> 处理节点: $node"
NODE_SUCCESS=true
# 确保目标目录存在
ssh "$node" "mkdir -p $DST_DIR"
if [[ $? -ne 0 ]]; then
echo " [ERROR] 无法在 $node 上创建目录: $DST_DIR"
NODE_SUCCESS=false
else
# 传输文件
scp -q "$SRC_FILE" "$node:$DST_DIR/"
if [[ $? -eq 0 ]]; then
echo " [OK] $FILENAME 已传输到 $node:$DST_DIR"
else
echo " [ERROR] $FILENAME 传输到 $node 失败"
NODE_SUCCESS=false
fi
fi
# 节点统计
if $NODE_SUCCESS; then
((SUCCESS_COUNT++))
else
((FAIL_COUNT++))
FAILED_NODES+=("$node")
fi
done
# === 总结 ===
echo "======================"
echo "分发完成"
echo "成功节点数量: $SUCCESS_COUNT"
echo "失败节点数量: $FAIL_COUNT"
if [[ $FAIL_COUNT -gt 0 ]]; then
echo "失败节点列表: ${FAILED_NODES[*]}"
fi
echo "======================"
EOF
chmod +x /script/copy_file.sh
拷贝镜像到其他节点:
cd /software
tar xf calico-v3.30.3.tar.gz && /script/copy_file.sh a /software/calico/images.tar
如果不是高可用集群,使用以下操作:
cd /software
tar xf calico-v3.30.3.tar.gz && /script/copy_file.sh w /software/calico/images.tar
所有节点导入镜像:
ctr -n k8s.io i import /software/calico/images.tar
6.1.2 安装软件
应用资源文件:
cd /software/calico/ && \
kubectl create -f tigera-operator.yaml
# 可以等上 10 秒钟再执行
kubectl create -f custom-resources.yaml
在线部署命令:
# 下载资源文件
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.4/manifests/tigera-operator.yaml
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.4/manifests/custom-resources.yaml
# 修改 Pod 网段(请和自己集群环境一致)
sed -i 's#cidr: 192.168.0.0/16#cidr: 172.16.0.0/12#g' custom-resources.yaml
grep cidr custom-resources.yaml
# 应用资源文件
kubectl create -f tigera-operator.yaml
# 可以等上 10 秒钟再执行
kubectl create -f custom-resources.yaml
查看所有 pod 是否正常运行:
kubectl get po -A -owide -w
7.安装 Metrics Server 组件
官方 Github 仓库:https://github.com/kubernetes-sigs/metrics-server
7.1 导入镜像
如果是在线安装且网络良好,可以不用提前导入镜像。
拷贝镜像到其他节点:
cd /software
tar xf metrics-server.tar.gz
/script/copy_file.sh w /software/metrics-server/metrics-server-v0.8.0.tar
所有 worker 节点导入镜像:
ctr -n k8s.io i import /software/metrics-server/metrics-server-v0.8.0.tar
7.2 安装软件和验证
应用资源文件:
cd /software/metrics-server && kubectl apply -f components.yaml
在线部署命令:
# 下载资源文件
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.8.0/components.yaml
# 修改配置文件
vim components.yaml
# 在 Deployment 配置文件的 spec.template.spec.containers.args 下添加 --kubelet-insecure-tls
...
spec:
containers:
- args:
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=10250
...
# 修改镜像地址为国内源地址(如果网络不好)
sed -i "s#registry.k8s.io/#k8s.m.daocloud.io/#g" components.yaml
grep image components.yaml
# 应用资源文件
kubectl apply -f components.yaml
查看 pod 运行状态
kubectl -n kube-system get pods -owide -l k8s-app=metrics-server -w
当 pod 都正常运行后,可以看到采集数据:
kubectl top node
kubectl top po -A
如果 pod 都正常运行,使用 top 命令还是不能获取相关数据,此时可以查看日志排错:
kubectl logs -f -n kube-system -l k8s-app=metrics-server
8.集群可用性验证
8.1 查看节点和 Pod是否正常
kubectl get nodes
kubectl get po -A
8.2 资源部署验证
8.2.1 Pod 资源部署
创建 Pod 资源:
cat<<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- name: busybox
image: docker.m.daocloud.io/library/busybox:1.28
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
EOF
查看 Pod 状态:
kubectl get pod -w
8.2.2 Deployment 资源部署
创建 Deployment 资源:
cat<<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: docker.m.daocloud.io/nginx
ports:
- containerPort: 80
EOF
查看 Pod 和 Deployment 资源状态:
kubectl get po,deployments.apps -l app=nginx -owide
# 输出信息如下
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-deployment-6bd444cb7-9czdh 1/1 Running 0 4m53s 172.20.79.68 k8s-worker01 <none> <none>
pod/nginx-deployment-6bd444cb7-r65wq 1/1 Running 0 4m53s 172.28.206.4 k8s-controller01 <none> <none>
pod/nginx-deployment-6bd444cb7-vdlwk 1/1 Running 0 4m53s 172.25.45.71 k8s-worker02 <none> <none>
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-deployment 3/3 3 3 4m53s nginx docker.m.daocloud.io/nginx app=nginx
8.3 DNS 解析验证明
解析默认名称空间 svc:
kubectl exec busybox -- nslookup `kubectl get svc|awk 'NR == 2{print $1}'`
# 输出信息
Server: 10.96.0.10
Address 1: 10.96.0.10 coredns.kube-system.svc.cluster.local
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
解析跨名称空间 svc(kube-system):
for name in $(kubectl get svc -n kube-system|awk 'NR != 1{print $1}'); do
echo -------------"$name".kube-system:-------------
kubectl exec busybox -- nslookup "$name".kube-system
echo
echo
done
# 输出信息
-------------coredns.kube-system:-------------
Server: 10.96.0.10
Address 1: 10.96.0.10 coredns.kube-system.svc.cluster.local
Name: coredns.kube-system
Address 1: 10.96.0.10 coredns.kube-system.svc.cluster.local
-------------metrics-server.kube-system:-------------
Server: 10.96.0.10
Address 1: 10.96.0.10 coredns.kube-system.svc.cluster.local
Name: metrics-server.kube-system
Address 1: 10.96.23.63 metrics-server.kube-system.svc.cluster.local
8.4 SVC(443)和 DNS(53)端口验证
所有节点验证:
telnet 10.96.0.1 443
telnet 10.96.0.10 53
curl 10.96.0.10:53
# 输出信息
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.
Connection closed by foreign host.
Trying 10.96.0.10...
Connected to 10.96.0.10.
Escape character is '^]'.
Connection closed by foreign host.
curl: (52) Empty reply from server
8.5 网络验证
测试同网段且不同节点 Pod 网络连通性:
cat > /script/network_test.sh <<-'EOF'
#!/bin/bash
# 颜色定义
GREEN=$'\033[32m'
RED=$'\033[31m'
BLUE=$'\033[34m'
YELLOW=$'\033[33m'
RESET=$'\033[0m'
MAX_NAME_LEN=30
# 参数判断:a 表示测试全部命名空间,否则只测试 default
ALL_NS=false
if [ "$1" == "a" ]; then
ALL_NS=true
fi
# 获取 busybox pod
if $ALL_NS; then
busybox_info=$(kubectl get po -A -o json | jq -r '
.items[] | select(.metadata.name=="busybox") |
"\(.metadata.namespace)\t\(.status.podIP)\t\(.spec.nodeName)"' | shuf -n1)
else
busybox_info=$(kubectl get po -n default -o json | jq -r '
.items[] | select(.metadata.name=="busybox") |
"\(.metadata.namespace)\t\(.status.podIP)\t\(.spec.nodeName)"' | shuf -n1)
fi
if [ -z "$busybox_info" ]; then
echo "未找到 busybox pod"
exit 1
fi
IFS=$'\t' read -r ns busybox_ip busybox_node <<< "$busybox_info"
echo ""
echo "busybox 命名空间: $ns"
echo -e "busybox IP: ${BLUE}$busybox_ip${RESET}"
echo -e "busybox 所在 Node: ${YELLOW}$busybox_node${RESET}"
echo ""
# 获取目标 Pod
if $ALL_NS; then
targets=$(kubectl get po -A -o json | jq -r --arg bn "$busybox_node" '
.items[] | select(.status.podIP != null) |
select(.status.podIP | test("^172\\.")) |
select(.spec.nodeName != $bn) |
"\(.metadata.namespace)\t\(.metadata.name)\t\(.status.podIP)\t\(.spec.nodeName)"')
else
targets=$(kubectl get po -n default -o json | jq -r --arg bn "$busybox_node" '
.items[] | select(.status.podIP != null) |
select(.status.podIP | test("^172\\.")) |
select(.spec.nodeName != $bn) |
"\(.metadata.namespace)\t\(.metadata.name)\t\(.status.podIP)\t\(.spec.nodeName)"')
fi
if [ -z "$targets" ]; then
echo "未找到同网段但不同节点的 Pod"
exit 0
fi
total=0
reachable=0
unreachable_list=()
# 遍历目标 Pod
while IFS=$'\t' read -r target_ns target_name target_ip target_node; do
[ -z "$target_ip" ] && continue
total=$((total+1))
[ ${#target_name} -gt $MAX_NAME_LEN ] && target_name="${target_name:0:$MAX_NAME_LEN}..."
if kubectl exec -n "$ns" busybox -- ping -c 2 -W 1 "$target_ip" &> /dev/null; then
status="${GREEN}[OK]${RESET}"
reachable=$((reachable+1))
else
status="${RED}[NG]${RESET}"
unreachable_list+=("$target_ns/$target_name/$target_ip")
fi
# 单行显示
echo -e "命名空间: $target_ns | Pod 名称: $target_name | IP: $target_ip | Node: $target_node | 状态: $status"
done <<< "$targets"
# 输出统计信息
echo ""
echo "本次共找到 Pod 数量为:$total"
echo "其中网络可达数量为:$reachable"
if [ ${#unreachable_list[@]} -gt 0 ]; then
echo "不可达的 Pod 分别为:"
for pod_info in "${unreachable_list[@]}"; do
echo " $pod_info"
done
fi
exit 0
EOF
chmod +x /script/network_test.sh
/script/network_test.sh
示例输出信息:
busybox 命名空间: default
busybox IP: 172.20.79.67
busybox 所在 Node: k8s-worker01
命名空间: default | Pod 名称: nginx-deployment-6bd444cb7-r65... | IP: 172.28.206.4 | Node: k8s-controller01 | 状态: [OK]
命名空间: default | Pod 名称: nginx-deployment-6bd444cb7-vdl... | IP: 172.25.45.71 | Node: k8s-worker02 | 状态: [OK]
本次共找到 Pod 数量为:2
其中网络可达数量为:2
默认测试验证 default 名称空间 Pod,测试全部 Pod 命令:
/script/network_test.sh a
8.6 删除测试资源
删除创建的 Pod 和 Deployment:
kubectl delete deployments.apps nginx-deployment && kubectl delete pod busybox
kubectl get po,deployments.apps
9.安装 dashborad
官方 Github 仓库:https://github.com/kubernetes/dashboard/releases
9.1 安装 helm
官方下载地址:https://github.com/helm/helm/releases
解压软件并验证:
cd /software
tar xf helm-v3.18.6-linux-amd64.tar.gz -C /usr/local/bin/ linux-amd64/helm --strip-components=1
helm version
在线安装命令:
wget https://get.helm.sh/helm-v3.18.6-linux-amd64.tar.gz
tar xf helm-v3.18.6-linux-amd64.tar.gz -C /usr/local/bin/ linux-amd64/helm --strip-components=1
helm version
配置命令自动补全:
source <(helm completion bash)
helm completion bash > /etc/bash_completion.d/helm
9.2 导入镜像
如果是在线安装且网络良好,可以不用提前导入镜像。
拷贝镜像到所有 worker 节点:
cd /software
tar xf dashboard.tar.gz
/script/copy_file.sh w /software/dashboard/all_images.tar
所有 worker 节点导入镜像:
ctr -n k8s.io i import /software/dashboard/all_images.tar
9.3 安装软件
安装软件:
cd /software/dashboard
tar xf kubernetes-dashboard-7.13.0.tgz && \
helm install kubernetes-dashboard ./kubernetes-dashboard/ --create-namespace --namespace kube-system
在线部署命令:
# 下载软件包
wget https://github.com/kubernetes/dashboard/releases/download/kubernetes-dashboard-7.13.0/kubernetes-dashboard-7.13.0.tgz
# 修改镜像地址为国内源地址(如果网络不好)
tar xf kubernetes-dashboard-7.13.0.tgz
sed -i 's#repository: docker.io#repository: docker.m.daocloud.io#g' kubernetes-dashboard/values.yaml
# 安装 dashboard
helm install kubernetes-dashboard ./kubernetes-dashboard/ --create-namespace --namespace kube-system
查看 Pod 是否都正常:
kubectl get pod -A -owide |grep dashboard
9.4 修改 svc 类型
修改类型为 NodePort:
kubectl edit svc -n kube-system kubernetes-dashboard-kong-proxy
9.5 创建登录 token
9.5.1 创建用户配置
应用资源文件:
cd /software/dashboard && kubectl apply -f dashboard-user.yaml
dashboard-user.yaml 文件内容:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kube-system
9.5.2 创建临时 token
创建 token:
kubectl -n kube-system create token admin-user
9.5.3 创建永久 token
应用资源文件:
cd /software/dashboard && kubectl apply -f dashboard-user-token.yaml
dashboard-user-token.yaml 文件内容:
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kube-system
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token
查看 token:
kubectl get secret admin-user -n kube-system -o jsonpath={".data.token"} | base64 -d
9.6 查看端口号并登录
查看端口号:
kubectl get svc kubernetes-dashboard-kong-proxy -n kube-system
# 输出信息
kubernetes-dashboard-kong-proxy NodePort 10.96.167.65 <none> 443:30755/TCP 87s
在浏览器访问并登录:
https://192.168.109.111:30755
在访问界面的 Bearer token 框内输入生成的 token 后点击登录即可。