【云计算】Kubernetes入门与实践：从部署到运维-平芜编程栈

【云计算】Kubernetes入门与实践：从部署到运维

引言

Kubernetes（简称K8s）作为容器编排领域的标杆技术，已经成为现代云原生应用部署的事实标准。它源自Google内部的Borg系统，经过多年的生产环境验证，于2015年开源并捐赠给CNCF（云原生计算基金会）。本文将全面介绍Kubernetes的核心概念、架构设计、核心资源对象、资源调度机制以及运维实践，帮助读者从零基础到能够独立完成生产环境的部署和运维工作。

一、Kubernetes概述

1.1 什么是Kubernetes

Kubernetes是一个开源的容器编排平台，用于自动化容器化应用的部署、扩展和管理。其核心特性包括：

自我修复：自动重启失败的容器，替换和重新调度不可用的节点
水平扩展：通过对Deployment的简单命令或基于CPU使用率的自动扩展
服务发现与负载均衡：为容器提供稳定的网络标识和流量分发
自动装箱：根据资源需求自动放置容器到合适的节点
配置管理与密钥管理：管理敏感信息和配置，避免泄漏到镜像中
存储编排：自动挂载存储系统，如本地存储、NFS、云存储等

1.2 Kubernetes架构

┌─────────────────────────────────────────────────────────────────┐ │ Kubernetes Cluster │ │ │ │ ┌──────────────────┐ │ │ │ Control Plane │ │ │ │ ┌────────────┐ │ │ │ │ │ API Server │ │ │ │ │ └────────────┘ │ │ │ │ ┌────────────┐ │ ┌────────────┐ ┌────────────┐ │ │ │ │ Scheduler │ │ │ Controller │ │ etcd │ │ │ │ │ │ │ │ Manager │ │ │ │ │ │ └────────────┘ │ └────────────┘ └────────────┘ │ │ └──────────────────┘ │ │ │ │ │ ┌────────┴────────────────────────────────────────────────┐ │ │ │ Data Plane │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ │ │ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ │ │ │ │ kubelet │ │ │ │ kubelet │ │ │ │ kubelet │ │ │ │ │ │ │ │kube-proxy│ │ │ │kube-proxy│ │ │ │kube-proxy│ │ │ │ │ │ │ └────┬────┘ │ │ └────┬────┘ │ │ └────┬────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌────▼────┐ │ │ ┌────▼────┐ │ │ ┌────▼────┐ │ │ │ │ │ │ │Container│ │ │ │Container│ │ │ │Container│ │ │ │ │ │ │ │ Runtime │ │ │ │ Runtime │ │ │ │ Runtime │ │ │ │ │ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘

1.3 核心组件详解

Control Plane（控制平面）组件：

kube-apiserver：集群的统一入口，处理所有RESTful API请求
etcd：高可用的键值存储，保存集群所有状态数据
kube-scheduler：负责Pod调度，将Pod分配到合适的节点
kube-controller-manager：运行各种控制器，确保集群期望状态

Node（工作节点）组件：

kubelet：节点代理，负责管理容器生命周期
kube-proxy：网络代理，维护网络规则
Container Runtime：容器运行时（Docker/containerd）

二、核心资源对象

2.1 Pod - 最小调度单元

# Pod基本定义 apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: nginx environment: production spec: containers: - name: nginx image: nginx:1.24 ports: - containerPort: 80 name: http protocol: TCP - containerPort: 443 name: https protocol: TCP resources: requests: memory: "128Mi" cpu: "250m" limits: memory: "256Mi" cpu: "500m" livenessProbe: httpGet: path: /healthz port: 80 initialDelaySeconds: 15 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 80 initialDelaySeconds: 5 periodSeconds: 5 env: - name: NGINX_HOST value: "localhost" - name: NGINX_PORT value: "80"

# 多容器Pod - Sidecar模式 apiVersion: v1 kind: Pod metadata: name: web-app-with-log-collector labels: app: web-app spec: containers: # 主应用容器 - name: web-app image: myapp:latest ports: - containerPort: 8080 volumeMounts: - name: shared-logs mountPath: /var/log/app # Sidecar日志收集容器 - name: log-collector image: fluent/fluent-bit:latest volumeMounts: - name: shared-logs mountPath: /var/log/app - name: fluentd-config mountPath: /fluentd/etc env: - name: FLUENTD_CONF value: "app.conf" # Sidecar代理容器 - name: envoy-proxy image: envoyproxy/envoy:v1.20 ports: - containerPort: 15001 env: - name: ENVOY_EDGE_STATS value: "true" volumes: - name: shared-logs emptyDir: {} - name: fluentd-config configMap: name: fluentd-config

2.2 ReplicaSet与Deployment

# ReplicaSet定义 apiVersion: apps/v1 kind: ReplicaSet metadata: name: nginx-replicaset labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.24 ports: - containerPort: 80

# Deployment定义 - 生产环境推荐 apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment labels: app: web-application spec: replicas: 5 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: web-application template: metadata: labels: app: web-application version: v1.0.0 spec: terminationGracePeriodSeconds: 30 containers: - name: web-app image: myorg/web-app:v1.0.0 ports: - containerPort: 8080 name: http - containerPort: 8443 name: https resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m" env: - name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url - name: REDIS_HOST value: "redis-service" - name: LOG_LEVEL valueFrom: configMapKeyRef: name: app-config key: log-level livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3 readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 failureThreshold: 3 lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 10"]

2.3 Service与Ingress

# ClusterIP Service - 内部访问 apiVersion: v1 kind: Service metadata: name: backend-service labels: app: backend spec: type: ClusterIP selector: app: backend ports: - name: http port: 80 targetPort: 8080 protocol: TCP - name: grpc port: 50051 targetPort: 50051 protocol: TCP

# NodePort Service - 节点端口访问 apiVersion: v1 kind: Service metadata: name: frontend-service spec: type: NodePort selector: app: frontend ports: - name: http port: 80 targetPort: 3000 nodePort: 30080 - name: https port: 443 targetPort: 3001 nodePort: 30443

# LoadBalancer Service - 云厂商负载均衡器 apiVersion: v1 kind: Service metadata: name: web-service annotations: # AWS ALB annotations service.beta.kubernetes.io/aws-load-balancer-type: "nlb" service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http" service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:xxx" spec: type: LoadBalancer selector: app: web ports: - name: https port: 443 targetPort: 8080

# Ingress - HTTP/HTTPS入口 apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: web-ingress annotations: kubernetes.io/ingress.class: "nginx" nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/force-ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "50m" nginx.ingress.kubernetes.io/proxy-connect-timeout: "30" nginx.ingress.kubernetes.io/proxy-read-timeout: "60" nginx.ingress.kubernetes.io/proxy-send-timeout: "60" spec: tls: - hosts: - www.example.com - api.example.com secretName: tls-secret rules: - host: www.example.com http: paths: - path: / pathType: Prefix backend: service: name: web-frontend port: number: 80 - path: /api pathType: Prefix backend: service: name: api-gateway port: number: 8080 - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: api-service port: number: 8080

2.4 ConfigMap与Secret

# ConfigMap - 应用配置 apiVersion: v1 kind: ConfigMap metadata: name: app-config data: # Properties格式 database.properties: | db.host=postgres-service db.port=5432 db.name=appdb db.pool.size=20 # JSON格式 config.json: | { "logLevel": "info", "features": { "newUI": true, "betaAPI": false }, "rateLimit": { "requests": 100, "window": "1m" } }

# Secret - 敏感数据 apiVersion: v1 kind: Secret metadata: name: app-secrets type: Opaque data: # Base64编码的值 # echo -n "password123" | base64 db-password: cGFzc3dvcmQxMjM= api-key: c29tZS1hcGkta2V5LWJhc2U2NC1lbmNvZGVk tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t... tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0t... stringData: # 纯文本格式，会自动Base64编码 username: admin

2.5 PersistentVolume与PersistentVolumeClaim

# PersistentVolume - NFS存储 apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv labels: type: nfs spec: capacity: storage: 100Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: server: nfs-server.example.com path: /exported/path --- # PersistentVolumeClaim apiVersion: v1 kind: PersistentVolumeClaim metadata: name: app-storage spec: accessModes: - ReadWriteMany resources: requests: storage: 50Gi selector: matchLabels: type: nfs --- # Pod使用PVC apiVersion: v1 kind: Pod metadata: name: app-with-storage spec: containers: - name: app image: myapp:latest volumeMounts: - name: app-data mountPath: /data volumes: - name: app-data persistentVolumeClaim: claimName: app-storage

三、核心概念详解

3.1 命名空间

# 命名空间定义 apiVersion: v1 kind: Namespace metadata: name: production labels: environment: production team: platform --- # 使用命名空间的资源 apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment namespace: production spec: replicas: 3 selector: matchLabels: app: web template: metadata: labels: app: web spec: containers: - name: web image: myorg/web:v1

# 命名空间操作 kubectl get namespaces kubectl create namespace staging kubectl delete namespace unused-namespace # 查看特定命名空间的资源 kubectl get pods -n production kubectl get all -n production # 设置默认命名空间 kubectl config set-context --current --namespace=production

3.2 标签与选择器

# 资源标签示例 apiVersion: apps/v1 kind: Deployment metadata: name: api-deployment labels: app: api version: v2.1.0 tier: backend environment: production team: backend managed-by: kubectl spec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api version: v2.1.0 tier: backend environment: production spec: containers: - name: api image: myorg/api:v2.1.0 labels: framework: spring-boot

# 标签选择器 kubectl get pods -l "app=api" kubectl get pods -l "app=api,version=v2" kubectl get pods -l "app in (api,web)" kubectl get pods -l "app notin (api,web)" kubectl get pods -l "environment=production,tier=backend" kubectl get deployments -l "!release" # 修改标签 kubectl label pods nginx-pod environment=production kubectl label pods nginx-pod version=v2 --overwrite kubectl label pods -l "app=api" team=backend --overwrite

3.3 注解

# 注解用于存储非标识性元数据 apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment annotations: # 构建信息 kubernetes.io/change-cause: "Deployment updated to v2.1.0" last-modified-by: "devops-team" # 配置信息 config.example.com/owner: "backend-team" config.example.com/support-email: "backend@example.com" # 监控信息 prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/metrics" spec: # ...

四、资源调度与伸缩

4.1 HPA - 水平Pod自动伸缩

# HorizontalPodAutoscaler apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 metrics: # 基于CPU使用率 - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # 基于内存使用率 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 # 基于自定义指标 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max

# 查看HPA状态 kubectl get hpa kubectl describe hpa web-hpa # 手动伸缩 kubectl scale deployment web-deployment --replicas=5

4.2 VPA - 垂直Pod自动伸缩

# VerticalPodAutoscaler apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: api-vpa spec: targetRef: apiVersion: apps/v1 kind: Deployment name: api-deployment updatePolicy: updateMode: "Auto" # Auto, Off, Initial resourcePolicy: containerPolicies: - containerName: api minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 2 memory: 2Gi controlledResources: ["cpu", "memory"]

4.3 资源配额与限制

# ResourceQuota - 命名空间级别资源配额 apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota spec: hard: requests.cpu: "10" requests.memory: 20Gi limits.cpu: "20" limits.memory: 40Gi pods: "50" services: "10" persistentvolumeclaims: "20" --- # LimitRange - Pod/Container资源限制 apiVersion: v1 kind: LimitRange metadata: name: default-limits spec: limits: - max: cpu: "2" memory: 2Gi min: cpu: 50m memory: 64Mi default: cpu: 500m memory: 512Mi defaultRequest: cpu: 200m memory: 256Mi type: Container

五、运维实践

5.1 滚动更新与回滚

# 滚动更新 kubectl set image deployment/web-deployment web=myorg/web:v2.0.0 kubectl rollout status deployment/web-deployment # 查看更新历史 kubectl rollout history deployment/web-deployment kubectl rollout history deployment/web-deployment --revision=3 # 回滚到上一版本 kubectl rollout undo deployment/web-deployment # 回滚到指定版本 kubectl rollout undo deployment/web-deployment --to-revision=2

# Deployment更新策略详解 spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # 最多超出期望副本数 maxUnavailable: 0 # 不可用Pod数量（建议为0保证服务连续性） # 探针配置影响更新过程 minReadySeconds: 30 # 新Pod就绪后最少运行时间 progressDeadlineSeconds: 600 # 更新超时时间

5.2 污点与容忍

# 节点污点 kubectl taint nodes node1 dedicated=gpu:NoSchedule kubectl taint nodes node1 special=true:PreferNoSchedule kubectl taint nodes node1 maintenance=true:NoExecute --overwrite # 查看污点 kubectl describe node node1 | grep -A5 Taints # Pod容忍污点 kubectl taint nodes node1 dedicated=gpu:NoSchedule

# Pod配置容忍 apiVersion: apps/v1 kind: Deployment metadata: name: ml-training spec: replicas: 1 selector: matchLabels: app: ml-training template: spec: tolerations: # 匹配NoSchedule污点 - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule" # 匹配任意污点 - key: "dedicated" operator: "Exists" effect: "NoSchedule" # 匹配任意effect - key: "special" operator: "Exists" nodeSelector: gpu: "true" containers: - name: training image: ml-training:latest resources: requests: nvidia.com/gpu: 1 limits: nvidia.com/gpu: 1

5.3 亲和性与反亲和性

# Pod反亲和性 - 分散部署 apiVersion: apps/v1 kind: Deployment metadata: name: redis-cluster spec: replicas: 6 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: ["redis"] topologyKey: "kubernetes.io/hostname" containers: - name: redis image: redis:7-alpine ports: - containerPort: 6379

# Pod亲和性 - 同节点部署 apiVersion: apps/v1 kind: Deployment metadata: name: logging-agent spec: replicas: 3 template: spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: ["web-app"] topologyKey: "kubernetes.io/hostname" podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: "app" operator: In values: ["logging-agent"] topologyKey: "kubernetes.io/hostname" containers: - name: fluentd image: fluent/fluentd:latest

5.4 调度器配置

# Pod优先级与抢占 apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000 globalDefault: false description: "High priority for production workloads" --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: low-priority value: 100 globalDefault: true description: "Default priority for batch jobs" --- # 使用优先级 apiVersion: apps/v1 kind: Deployment metadata: name: critical-service spec: template: spec: priorityClassName: high-priority containers: - name: app image: myapp:latest

# 调度器配置 kube-scheduler --config=/etc/kubernetes/scheduler-config.yaml # Pod调度多选题 kubectl label nodes node1 zone=primary kubectl label nodes node2 zone=secondary kubectl label nodes node1 disk-type=ssd kubectl label nodes node2 disk-type=HDD

5.5 集群监控与日志

# Prometheus监控配置 apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: web-app-monitor labels: team: platform spec: selector: matchLabels: app: web endpoints: - port: metrics path: /metrics interval: 15s namespaceSelector: matchNames: - production --- # 日志收集 - Fluentd配置 apiVersion: v1 kind: ConfigMap metadata: name: fluentd-config data: fluent.conf: | <source> @type tail path /var/log/containers/*.log pos_file /var/log/fluentd-containers.log.pos tag kubernetes.* <parse> @type json time_key time time_format %Y-%m-%dT%H:%M:%S.%NZ </parse> </source> <filter kubernetes.**> @type kubernetes_metadata @id kubernetes_metadata </filter> <match kubernetes.**> @type elasticsearch host elasticsearch.logging.svc port 9200 logstash_format true logstash_prefix kubernetes </match>

六、生产环境最佳实践

6.1 高可用架构

# 高可用Deployment配置 apiVersion: apps/v1 kind: Deployment metadata: name: ha-web-app spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: web template: spec: affinity: # 反亲和性确保Pod分布在不同节点 podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: web topologyKey: kubernetes.io/hostname # 节点亲和性分散到不同可用区 nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - zone-a - zone-b - zone-c containers: - name: web image: myorg/web:v1 resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" cpu: "1000m" readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 terminationGracePeriodSeconds: 60

6.2 灾难恢复

# 备份策略 # 1. etcd快照 ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key # 2. 恢复集群 ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \ --data-dir=/var/lib/etcd/restore # 3. 资源导出 kubectl get all --all-namespaces -o yaml > all-resources.yaml kubectl get configmaps -n production -o yaml > configmaps.yaml kubectl get secrets -n production -o yaml > secrets.yaml

总结

Kubernetes作为云原生时代的核心基础设施，提供了强大的容器编排能力。本文从核心概念出发，详细介绍了Pod、Deployment、Service、ConfigMap、Secret等核心资源对象，以及调度机制、运维实践和最佳配置。

掌握Kubernetes需要理论与实践相结合，建议读者：

动手实践：搭建本地集群（Minikube/kind）进行实验
深入原理：理解Kubernetes的设计理念和架构
关注生产：学习高可用部署、监控告警等运维技能
持续学习：关注CNCF生态和Kubernetes版本更新

希望本文能为读者的Kubernetes学习之旅提供系统性的指导。

【云计算】Kubernetes入门与实践：从部署到运维

【云计算】Kubernetes入门与实践：从部署到运维

引言

一、Kubernetes概述

1.1 什么是Kubernetes

1.2 Kubernetes架构

1.3 核心组件详解

二、核心资源对象

2.1 Pod - 最小调度单元

2.2 ReplicaSet与Deployment

2.3 Service与Ingress

2.4 ConfigMap与Secret

2.5 PersistentVolume与PersistentVolumeClaim

三、核心概念详解

3.1 命名空间

3.2 标签与选择器

3.3 注解

四、资源调度与伸缩

4.1 HPA - 水平Pod自动伸缩

4.2 VPA - 垂直Pod自动伸缩

4.3 资源配额与限制

五、运维实践

5.1 滚动更新与回滚

5.2 污点与容忍

5.3 亲和性与反亲和性

5.4 调度器配置

5.5 集群监控与日志

六、生产环境最佳实践

6.1 高可用架构

6.2 灾难恢复

总结

[开源] 医联体结算博弈结构可视化系统：用纳什均衡定位多记账与少付出的策略失衡点，面向联盟办和医保结算岗的决策支持工具

用labview制作的上位机界面的多语言显示

【2026】ISCC 长虹守卫

除了Ctrl+Alt+A，国产系统（UOS/麒麟/NFS）还有哪些隐藏的截图姿势？

5秒音频也能玩转AI？手把手教你用ESC-50数据集入门环境声音分类

10分钟上手oam-tools：昇腾NPU运维自动化工具集