news 2025/12/29 14:18:38

Kubernetes etcd备份恢复集群升级指南

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
Kubernetes etcd备份恢复集群升级指南

集群管理命令

etcdctl是一个命令行的客户端,它提供了一些命令,可以方便我们在对服务进行测试或者手动修改数据库内容。etcdctl命令基本用法如下所示:

etcdctl [global options] command [command options] [args...]

具体的命令选项参数可以通过 etcdctl command --help来获取相关帮助

环境变量

获得etcd数据库的访问url

[root@k8s-master ~]# kubectl -n kube-system get pods etcd-k8s-master -o yaml | grep -A10 "containers:" | grep "https://" - --advertise-client-urls=https://192.168.158.15:2379 - --initial-advertise-peer-urls=https://192.168.158.15:2380 - --initial-cluster=k8s-master=https://192.168.158.15:2380 ​

如果遇到使用了TLS加密的集群,通常每条指令都需要指定证书路径和etcd节点地址,可以把相关命令行参数添加在环境变量中,在~/.bashrc添加以下内容:

[root@tiaoban etcd]# cat ~/.bashrc HOST_1=https://192.168.166.3:2379 ENDPOINTS=${HOST_1} # 如果需要使用原生命令,在命令开头加一个\ 例如:\etcdctl command alias etcdctl="etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.158.6:2379 --insecure-skip-tls-verify" [root@tiaoban etcd]# source ~/.bashrc
查看etcd版本
[root@tiaoban etcd]# etcdctl version etcdctl version: 3.4.23 API version: 3.4
查看etcd集群节点信息
etcdctl member list -w table +------------------+---------+------------+----------------------------+----------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+------------+----------------------------+----------------------------+------------+ | eba84a8571780cea | started | k8s-master | https://192.168.166.3:2380 | https://192.168.166.3:2379 | false | +------------------+---------+------------+----------------------------+----------------------------+------------+ ​
查看集群健康状态
etcdctl endpoint status -w table +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.166.3:2379 | eba84a8571780cea | 3.5.15 | 7.1 MB | true | false | 4 | 15658 | 15658 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ ​ ####表格内容解析 ​ ENDPOINT:节点的地址,这里是 https://192.168.158.6:2379。这表示该节点的网络地址和端口。 ID:节点的唯一标识符,这里是 6dc5c9ae772d8898。 VERSION:节点的版本号,这里是 3.5.9。 DB SIZE:数据库大小,这里是 9.1 MB。 IS LEADER:是否为集群的领导者节点。true 表示该节点是领导者。 IS LEARNER:是否为学习者节点。false 表示该节点不是学习者节点。 RAFT TERM:Raft协议中的任期编号,这里是 4。Raft协议用于分布式系统的共识机制,任期编号用于区分不同的选举周期。 RAFT INDEX:Raft协议中的日志索引,这里是 30622。它表示当前日志的最新位置。 RAFT APPLIED INDEX:Raft协议中已应用的日志索引,这里是 30622。它表示已提交并应用到状态机的日志位置。 ERRORS:错误信息,这里为空,表示没有错误
查看告警事件

如果内部出现问题,会触发告警,可以通过命令查看告警引起原因,命令如下所示:

etcdctl alarm <subcommand> [flags]

常用的子命令主要有两个:

# 查看所有告警 etcdctl alarm list # 解除所有告警 etcdctl alarm disarm
添加成员(单节点部署的etcd无法直接扩容)(不用做)

当集群部署完成后,后续可能需要进行节点扩缩容,就可以使用member命令管理节点。先查看当前集群信息

[root@tiaoban etcd]# etcdctl endpoint status --cluster -w table +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.10.100:2379 | 2e0eda3ad6bc6e1e | 3.4.23 | 20 kB | true | false | 8 | 16 | 16 | | | http://192.168.10.12:2379 | 5d2c1bd3b22f796f | 3.4.23 | 20 kB | false | false | 8 | 16 | 16 | | | http://192.168.10.11:2379 | bc34c6bd673bdf9f | 3.4.23 | 20 kB | false | false | 8 | 16 | 16 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

在启动新的etcd节点前,先向etcd集群声明添加节点的peer-urls和节点名称

[root@tiaoban etcd]# etcdctl member add etcd4 --peer-urls=http://192.168.158.9:2380 Member b112a60ec305e42a added to cluster cd30cff36981306b ​ ETCD_NAME="etcd4" ETCD_INITIAL_CLUSTER="etcd1=http://192.168.10.100:2380,etcd3=http://192.168.10.12:2380,etcd4=http://192.168.10.100:12380,etcd2=http://192.168.10.11:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.10.100:12380" ETCD_INITIAL_CLUSTER_STATE="existing"

接下来使用docker创建一个版本为3.4.23的etcd节点,运行在192.168.10.100上,使用host网络模式,endpoints地址为http://192.168.10.100:12379,节点名称为etcd4。

[root@tiaoban etcd]# mkdir -p /opt/docker/etcd/{conf,data} [root@tiaoban etcd]# chown -R 1001:1001 /opt/docker/etcd/data/ [root@tiaoban etcd]# cat /opt/docker/etcd/conf/etcd.conf # 节点名称 name: 'etcd4' # 指定节点的数据存储目录 />

  1. 变更/var/lib/etcd

[root@k8s-01 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak [root@k8s-01 kubernetes]#
  1. 恢复etcd数据

[root@k8s-01 lib]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" snapshot restore /opt/etcd-back/snap.db --data-dir=/var/lib/etcd/
  1. 启动etcd和apiserver,查看pods

[root@k8s-01 lib]# cd /etc/kubernetes/ [root@k8s-01 kubernetes]# mv manifests-backup manifests [root@k8s-01 kubernetes]# kubectl get pods NAME READY STATUS RESTARTS AGE nfs-client-provisioner-69b76b8dc6-6l8xs 1/1 Running 12 (2m25s ago) 4h48m [root@k8s-01 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-65898446b5-t2mqq 1/1 Running 11 (16h ago) 21h calico-node-8md6b 1/1 Running 0 21h calico-node-9457b 1/1 Running 0 21h calico-node-nxs2w 1/1 Running 0 21h calico-node-p7d52 1/1 Running 0 21h coredns-7f6cbbb7b8-g84gl 1/1 Running 0 22h coredns-7f6cbbb7b8-j9q4q 1/1 Running 0 22h etcd-k8s-01 1/1 Running 0 22h kube-apiserver-k8s-01 1/1 Running 0 22h kube-controller-manager-k8s-01 1/1 Running 0 22h kube-proxy-49b8g 1/1 Running 0 22h kube-proxy-8wh5l 1/1 Running 0 22h kube-proxy-b6lqq 1/1 Running 0 22h kube-proxy-tldpv 1/1 Running 0 22h kube-scheduler-k8s-01 1/1 Running 0 22h [root@k8s-01 ~]#

由于3个nginx是备份之后启动的,所以恢复后都不存在了。

多master集群

环境准备:kubeadm安装的二主二从

[root@k8s-01 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-01 Ready control-plane,master 16h v1.22.3 k8s-02 Ready control-plane,master 16h v1.22.3 k8s-03 Ready <none> 16h v1.22.3 k8s-04 Ready <none> 16h v1.22.3 [root@k8s-01 etcd-v3.5.4-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.123:2379,https://192.168.1.124:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list 58915ab47aed1957, started, k8s-02, https://192.168.1.124:2380, https://192.168.1.124:2379, false c48307bcc0ac155e, started, k8s-01, https://192.168.1.123:2380, https://192.168.1.123:2379, false [root@k8s-01 etcd-v3.5.4-linux-amd64]#
  1. 2台master都需要备份:

[root@k8s-01 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /snap-$(date +%Y%m%d%H%M).db [root@k8s-02 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /snap-$(date +%Y%m%d%H%M).db

2.创建3个测试pod

[root@k8s-01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-6799fc88d8-2x6gw 1/1 Running 0 4m22s nginx-6799fc88d8-82mjz 1/1 Running 0 4m22s nginx-6799fc88d8-sbb6n 1/1 Running 0 4m22s tomcat-7d987c7694-552v2 1/1 Running 0 2m8s [root@k8s-01 ~]#

3.停掉Master机器的kube-apiserver和etcd

[root@k8s-01 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/ [root@k8s-02 kubernetes]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/

4.变更/var/lib/etcd

[root@k8s-01 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak [root@k8s-02 kubernetes]# mv /var/lib/etcd /var/lib/etcd.bak

5.恢复etcd数据,etcd集群用同一份snapshot恢复;

[root@k8s-01 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db --endpoints=192.168.1.123:2379 --name=k8s-01 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-advertise-peer-urls=https://192.168.1.123:2380 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380 --data-dir=/var/lib/etcd [root@k8s-01 /]# scp snap-202207182330.db root@192.168.1.124:/ root@192.168.1.124's password: snap-202207182330.db 100% 4780KB 45.8MB/s 00:00 [root@k8s-02 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db --endpoints=192.168.1.124:2379 --name=k8s-02 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-advertise-peer-urls=https://192.168.1.124:2380 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380 --data-dir=/var/lib/etcd

6.master节点上启动etcd和apiserver,查看pods 6.master节点上启动etcd和apiserver,查看pods

[root@k8s-01 lib]# cd /etc/kubernetes/ [root@k8s-01 kubernetes]# mv manifests-backup manifests [root@k8s-02 lib]# cd /etc/kubernetes/ [root@k8s-02 kubernetes]# mv manifests-backup manifests [root@k8s-01 lib]# kubectl get pods ###发现无法看到后创建的pod信息 [root@k8s-01 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-65898446b5-drjjj 1/1 Running 10 (16h ago) 16h calico-node-9s7p2 1/1 Running 0 16h calico-node-fnbj4 1/1 Running 0 16h calico-node-nx6q6 1/1 Running 0 16h calico-node-qcffj 1/1 Running 0 16h coredns-7f6cbbb7b8-mn9hj 1/1 Running 0 16h coredns-7f6cbbb7b8-nrwbf 1/1 Running 0 16h etcd-k8s-01 1/1 Running 1 16h etcd-k8s-02 1/1 Running 0 16h kube-apiserver-k8s-01 1/1 Running 2 (16h ago) 16h kube-apiserver-k8s-02 1/1 Running 0 16h kube-controller-manager-k8s-01 1/1 Running 2 16h kube-controller-manager-k8s-02 1/1 Running 0 16h kube-proxy-d824j 1/1 Running 0 16h kube-proxy-k5gw4 1/1 Running 0 16h kube-proxy-mxmhp 1/1 Running 0 16h kube-proxy-nvpf4 1/1 Running 0 16h kube-scheduler-k8s-01 1/1 Running 1 16h kube-scheduler-k8s-02 1/1 Running 0 16h [root@k8s-01 ~]#

Kubernetes集群升级指南

前言

本文演示kubernetes集群从v1.24.1升级到v1.29.15。

一、集群升级过程辅助命令

(1)查看节点上运行的pod。

kubectl get pod -o wide |grep <nodename>

(2)查看集群配置文件。

kubectl -n kube-system get cm kubeadm-config -o yaml

(3)查看当前集群节点。

kubectl get node

二、升级master节点

2.1、升级kubeadm。

# 更新包管理器 yum update # 查看可用版本 apt-cache madison kubeadm yum list | grep kubeadm # 更新 yum update -y kubeadm ​ # 验证版本 kubeadm version

2.2、验证升级计划

(1)检查可升级到哪些版本,并验证你当前的集群是否可升级。

kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct: [upgrade/config] Reading configuration from the cluster... [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [preflight] Running pre-flight checks. [upgrade] Running cluster health checks [upgrade] Fetching available versions to upgrade to [upgrade/versions] Cluster version: v1.28.15 [upgrade/versions] kubeadm version: v1.29.15 I0327 11:28:43.151508 1125701 version.go:256] remote version is much newer: v1.32.3; falling back to: stable-1.29 [upgrade/versions] Target version: v1.29.15 [upgrade/versions] Latest version in the v1.28 series: v1.28.15 ​ Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply': COMPONENT CURRENT TARGET kubelet 3 x v1.28.15 v1.29.15 ​ Upgrade to the latest stable version: ​ COMPONENT CURRENT TARGET kube-apiserver v1.28.15 v1.29.15 kube-controller-manager v1.28.15 v1.29.15 kube-scheduler v1.28.15 v1.29.15 kube-proxy v1.28.15 v1.29.15 CoreDNS v1.10.1 v1.11.1 etcd 3.5.15-0 3.5.16-0 ​ You can now apply the upgrade by executing the following command: ​ kubeadm upgrade apply v1.29.15 ​ _____________________________________________________________________ ​ ​ The table below shows the current state of component configs as understood by this version of kubeadm. Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually upgrade to is denoted in the "PREFERRED VERSION" column. ​ API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED kubeproxy.config.k8s.io v1alpha1 v1alpha1 no kubelet.config.k8s.io v1beta1 v1beta1 no _____________________________________________________________________

注意下面的MANUAL字段:

The table below shows the current state of component configs as understood by this version of kubeadm. Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually upgrade to is denoted in the "PREFERRED VERSION" column. ​ API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED kubeproxy.config.k8s.io v1alpha1 v1alpha1 no kubelet.config.k8s.io v1beta1 v1beta1 no _____________________________________________________________________ ​ ​

指示哪些主键需要手动升级,如果是yes就要手动升级。

(2)显示哪些差异将被应用于现有的静态 pod 资源清单。

kubeadm upgrade diff 1.29.15
[upgrade/diff] Reading configuration from the cluster... [upgrade/diff] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' --- /etc/kubernetes/manifests/kube-apiserver.yaml +++ new manifest @@ -40,7 +40,7 @@ - --service-cluster-ip-range=10.96.0.0/12 - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key - image: registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.15 + image: registry.aliyuncs.com/google_containers/kube-apiserver:1.29.15 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8 --- /etc/kubernetes/manifests/kube-controller-manager.yaml +++ new manifest @@ -28,7 +28,7 @@ - --service-account-private-key-file=/etc/kubernetes/pki/sa.key - --service-cluster-ip-range=10.96.0.0/12 - --use-service-account-credentials=true - image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.15 + image: registry.aliyuncs.com/google_containers/kube-controller-manager:1.29.15 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8 --- /etc/kubernetes/manifests/kube-scheduler.yaml +++ new manifest @@ -16,7 +16,7 @@ - --bind-address=127.0.0.1 - --kubeconfig=/etc/kubernetes/scheduler.conf - --leader-elect=true - image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.15 + image: registry.aliyuncs.com/google_containers/kube-scheduler:1.29.15 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8 ​

2.3、master节点升级

(1)升级到 1.29.15版本,此命令仅升级master节点(control plane)。

kubeadm upgrade apply v1.29.15
[upgrade/config] Making sure the configuration is correct: [upgrade/config] Reading configuration from the cluster... [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [preflight] Running pre-flight checks. [upgrade] Running cluster health checks [upgrade/version] You have chosen to change the cluster version to "v1.29.15" [upgrade/versions] Cluster version: v1.28.15 [upgrade/versions] kubeadm version: v1.29.15 [upgrade] Are you sure you want to proceed? [y/N]: y [upgrade/prepull] Pulling images required for setting up a Kubernetes cluster [upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection [upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull' [upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.29.15" (timeout: 5m0s)... [upgrade/etcd] Upgrading to TLS for etcd [upgrade/staticpods] Preparing for "etcd" upgrade [upgrade/staticpods] Renewing etcd-server certificate [upgrade/staticpods] Renewing etcd-peer certificate [upgrade/staticpods] Renewing etcd-healthcheck-client certificate [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-03-27-11-32-38/etcd.yaml" [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) [apiclient] Found 1 Pods for label selector component=etcd [upgrade/staticpods] Component "etcd" upgraded successfully! [upgrade/etcd] Waiting for etcd to become available [upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests2230279311" [upgrade/staticpods] Preparing for "kube-apiserver" upgrade [upgrade/staticpods] Renewing apiserver certificate [upgrade/staticpods] Renewing apiserver-kubelet-client certificate [upgrade/staticpods] Renewing front-proxy-client certificate [upgrade/staticpods] Renewing apiserver-etcd-client certificate [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-03-27-11-32-38/kube-apiserver.yaml" [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) [apiclient] Found 1 Pods for label selector component=kube-apiserver [upgrade/staticpods] Component "kube-apiserver" upgraded successfully! [upgrade/staticpods] Preparing for "kube-controller-manager" upgrade [upgrade/staticpods] Renewing controller-manager.conf certificate [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-03-27-11-32-38/kube-controller-manager.yaml" [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) [apiclient] Found 1 Pods for label selector component=kube-controller-manager [upgrade/staticpods] Component "kube-controller-manager" upgraded successfully! [upgrade/staticpods] Preparing for "kube-scheduler" upgrade [upgrade/staticpods] Renewing scheduler.conf certificate [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-03-27-11-32-38/kube-scheduler.yaml" [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) [apiclient] Found 1 Pods for label selector component=kube-scheduler [upgrade/staticpods] Component "kube-scheduler" upgraded successfully! [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster [upgrade] Backing up kubelet config file to /etc/kubernetes/tmp/kubeadm-kubelet-config3777955110/config.yaml [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "super-admin.conf" kubeconfig file [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy ​ [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.29.15". Enjoy! ​ [upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so. ​

(2) 腾空节点,即将节点上除守护进程之外的其他进程调度到其他节点,同时将开启调度保护。

kubectl drain <nodename> --ignore-daemonsets
$ kubectl drain k8s-master1 --ignore-daemonsets node/k8s-master1 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-nxz4d, kube-system/kube-proxy-pbnk4 evicting pod kube-system/coredns-c676cc86f-twm96 evicting pod kube-system/coredns-c676cc86f-mdgbn pod/coredns-c676cc86f-mdgbn evicted pod/coredns-c676cc86f-twm96 evicted node/k8s-master1 drained ​ $ kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-nxz4d 1/1 Running 0 136m kube-system coredns-c676cc86f-7stvs 0/1 Pending 0 60s kube-system coredns-c676cc86f-vmkgv 0/1 Pending 0 60s kube-system etcd-k8s-master1 1/1 Running 0 11m kube-system kube-apiserver-k8s-master1 1/1 Running 0 10m kube-system kube-controller-manager-k8s-master1 1/1 Running 0 10m kube-system kube-proxy-pbnk4 1/1 Running 0 9m44s kube-system kube-scheduler-k8s-master1 1/1 Running 0 9m58s ​ $ kubectl get node NAME STATUS ROLES AGE VERSION k8s-master1 Ready,SchedulingDisabled control-plane 162m v1.24.1 ​

(3)升级kubelet与kubectl组件。

yum update -y kubelet ​

(4)重启 kubelet。

systemctl daemon-reload systemctl restart kubelet

(5)解除调度保护。

kubectl uncordon <nodename>

三、升级node节点

(1)升级节点kubelet 配置。

kubeadm upgrade node

(2)腾空节点,同时开启调度保护,此命令请在master节点操作

kubectl drain <nodename> --ignore-daemonsets

(3)升级kubelet与kubectl组件。

yum update -y kubelet

(4)重启 kubelet。

systemctl daemon-reload systemctl restart kubelet

(5)解除调度保护,master节点上执行该命令。

kubectl uncordon <nodename>

总结

每个版本的升级都不一样,所以要根据版本进行适当调整,不作为万能指导。 升级过程:

  1. 升级master组件。

  2. 升级worker节点组件,调度保护、排空节点、worker节点组件升级、解除保护。

Kubernetes集群的升级可以分为以下几个步骤:

  1. 备份数据。在升级之前,需要备份Kubernetes集群的数据,包括访问控制、配置文件、数据卷等。

  2. 选择升级方式。Kubernetes集群的升级方式可以分为两种:滚动升级和强制替换。滚动升级是指逐个升级每个节点,直到所有节点都升级完成。强制替换是指一次性替换所有节点,将旧节点直接替换为新节点。

  3. 准备新版本。Kubernetes升级需要准备新版本的二进制文件和镜像文件。可以从Kubernetes官方网站下载最新版本的二进制文件和镜像文件,并上传到集群中的节点上。

  4. 升级Master节点。首先需要升级Master节点,使用新版本的二进制文件替换旧版本的二进制文件,并启动新版本的Kubernetes API Server、ControllerKubernetes是一个快速发展的开源项目,为了保持其功能和安全性,集群的升级是必须的。

    • 查看升级文档:首先需要查看官方的升级文档,了解升级过程中需要注意的事项。

    • 备份数据:在升级前需要备份当前的数据,以防升级过程中的意外情况导致数据丢失。

    • 准备好备份:在升级前需要确保备份的可用性,以便在需要时能够 Manager和Scheduler。

  5. 升级Node节点。接下来需要升级Node节点。首先需要将节点上的Kubelet和kube-proxy服务停止,使用新版本的二进制文件替换旧版本的二进制文件,然后启动新版本的Kubelet和kube-proxy服务。

  6. 验证升级结果。升级完成后,需要验证恢复数据。

    • 升级前的测试:可以在测试环境中进行升级测试,以确保升级过程和升级后的集群正常运行。

    • 升级Node:首先需要升级每个Node节点中的Kubernetes组件,包括kubelet和kube-proxy等。

    • 升级Control Plane:然后需要升级Control Plane中的Kubernetes组件,包括kube-apiserver、kube-controller-manager和kube-scheduler等。

    • 升级Kubernetes对象:升级完Control Plane后,需要升级Kubernetes对象,如Deployment集群是否正常运行。可以使用kubectl命令查看集群的状态和资源对象的状态,确保所有的服务都能够正常访问。

  7. 回滚升级。如果升级失败或出现问题,可以回滚到之前的版本。回滚的过程与升级的过程相同,只需要使用旧版本的二进制文件和镜像文件即可。

Kubernetes集群的升级需要仔细规划和准备,并按照一定的步骤进行操作。只有在备份数据、选择适当的升级方式、准备新版本、升级Master节点、升级Node节点、验证升级结果等步骤都完成后,才能确保集群的升级成功。、StatefulSet等。

升级后的检查:

  • 验证集群状态:升级后需要验证集群的状态,包括Node节点的状态、Pod的状态、Service的状态等。

  • 验证应用程序:升级后需要验证应用程序的运行状态,确保应用程序正常运行。

  • 观察日志:如果发现问题,可以通过查看日志来排查问题原因。

Kubernetes集群升级是一个需要谨慎处理的过程,需要充分准备和测试,以确保升级过程的顺利和集群的稳定。在升级过程中,需要注意备份数据和备份的可用性,升级顺序和升级后的检查等问题,以确保集群的正常运行和应用程序的稳定性。

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2025/12/17 20:24:35

对比:手写VS AI生成Freemarker模板效率实测

快速体验 打开 InsCode(快马)平台 https://www.inscode.net输入框内输入如下内容&#xff1a; 请生成两个功能相同的Freemarker模板对比示例&#xff1a;一个用户管理列表页。第一个用传统方式手写代码实现&#xff1b;第二个使用快马AI生成。要求包含分页、搜索过滤、表格展示…

作者头像 李华
网站建设 2025/12/25 7:09:22

摄影师必备:用HitPaw快速去除作品中的意外水印

快速体验 打开 InsCode(快马)平台 https://www.inscode.net输入框内输入如下内容&#xff1a; 创建一个摄影作品水印处理演示页面。功能包括&#xff1a;1) 上传图片展示水印问题 2) 演示水印去除前后对比 3) 提供不同强度调节选项 4) 展示处理后的高清效果。使用React前端Nod…

作者头像 李华
网站建设 2025/12/17 20:24:08

CSS Grid: 像拼图游戏一样轻松搞定网页布局

生活中的例子 01创建一个像 Pinterest 那样的复杂图片墙展示。生活中的例子 02设计一个标准的博客布局&#xff08;包含页眉、侧边栏、文章区和页脚&#xff09;。生活中的例子 03制作一个数据仪表盘&#xff0c;将不同的图表和统计数据整齐排列。生活中的例子 04排版一个像杂志…

作者头像 李华
网站建设 2025/12/17 20:23:10

电商推荐系统实战:基于LangChain和LangGraph的智能实现

快速体验 打开 InsCode(快马)平台 https://www.inscode.net输入框内输入如下内容&#xff1a; 开发一个电商个性化推荐系统&#xff0c;使用LangChain处理用户行为数据&#xff0c;LangGraph构建推荐图谱。要求&#xff1a;1) 从CSV导入用户浏览历史 2) 使用LangChain进行特征…

作者头像 李华
网站建设 2025/12/17 20:23:09

Linux网络--数据链路层

大家好&#xff0c;上次我们学习了网络层IP协议&#xff0c;今天我们来继续学习Linux网络的数据链路层&#xff0c;那么话不多说我们开始今天的学习&#xff1a; 目录 数据链路层 数据链路层 1. 对比理解 "数据链路层" 和 "网络层" 2. 认识以太网 2.…

作者头像 李华