Kubernetes生产环境部署指南 Kubernetes是云原生的核心平台,本文提供从规划到部署的完整生产环境指南。 集群规划 节点规划 网络规划 集群安装 使用kubeadm 初始化控制平面 加入其他节点 高可用配置 API Server负载均衡 etcd集群 存储配置 StorageClass PV/PVC 安全配置 RBAC Pod安全 网络策略 监控与日志 Prometheus监控 日志收集 备份与恢复 通过系统化的部署和配置,可以构建高可用、安全可靠的Kubernetes生产环境。
Kubernetes是云原生的核心平台,本文提供从规划到部署的完整生产环境指南。
# 生产环境推荐配置 control-plane: replicas: 3 # 高可用 resources: cpu: "4" memory: "8Gi" storage: "100Gi" worker-nodes: min-replicas: 3 resources: cpu: "16" memory: "64Gi" storage: "500Gi" # 分组管理 node-groups: - name: general count: 6 labels: workload: general - name: gpu count: 2 labels: workload: gpu resources: nvidia.com/gpu: "4"
# Pod网络(推荐CNI插件) pod-network: plugin: calico # 或flannel/weave cidr: 10.244.0.0/16 # Service网络 service-network: cidr: 10.96.0.0/12 # 节点网络 node-network: cidr: 192.168.1.0/24
# 在所有节点安装容器运行时 cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf overlay br_netfilter EOF modprobe overlay modprobe br_netfilter # 安装containerd apt-get update apt-get install -y containerd # 配置containerd mkdir -p /etc/containerd containerd config default > /etc/containerd/config.toml # 使用systemd cgroup sed -i "s/SystemdCgroup = false/SystemdCgroup = true/" /etc/containerd/config.toml systemctl restart containerd # 安装kubeadm、kubelet、kubectl apt-get install -y apt-transport-https ca-certificates curl curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /" | tee /etc/apt/sources.list.d/kubernetes.list apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl
# 在第一个控制平面节点 kubeadm init \\ --control-plane-endpoint "kube-apiserver.example.com:6443" \\ --upload-certs \\ --pod-network-cidr "10.244.0.0/16" \\ --service-cidr "10.96.0.0/12" # 配置kubectl mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config # 安装CNI插件 kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# 在其他控制平面节点 kubeadm join kube-apiserver.example.com:6443 \\ --token <token> \\ --discovery-token-ca-cert-hash sha256:<hash> \\ --control-plane --certificate-key <cert-key> # 在工作节点 kubeadm join kube-apiserver.example.com:6443 \\ --token <token> \\ --discovery-token-ca-cert-hash sha256:<hash>
# 使用HAProxy或Nginx apiVersion: v1 kind: ConfigMap metadata: name: haproxy-config namespace: kube-system data: haproxy.cfg: | frontend kube-api bind *:6443 default_backend kube-api-backend backend kube-api-backend mode tcp balance roundrobin option tcp-check server cp1 192.168.1.10:6443 check server cp2 192.168.1.11:6443 check server cp3 192.168.1.12:6443 check
# etcd是Kubernetes的键值存储 # 生产环境使用外部etcd集群 # 创建etcd集群 etcdctl member list # 备份etcd ETCDCTL_API=3 etcdctl snapshot save snapshot.db \\ --endpoints=https://127.0.0.1:2379 \\ --cacert=/etc/kubernetes/pki/etcd/ca.crt \\ --cert=/etc/kubernetes/pki/etcd/server.crt \\ --key=/etc/kubernetes/pki/etcd/server.key # 恢复etcd etcdctl snapshot restore snapshot.db \\ --data-dir=/var/lib/etcd-restored
# NFS StorageClass apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: nfs provisioner: nfs-provisioner parameters: archiveOnDelete: "false" volumeBindingMode: Immediate # Ceph RBD StorageClass apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd provisioner: rbd.csi.ceph.com parameters: clusterID: <cluster-id> pool: kube imageFormat: "2" imageFeatures: layering volumeBindingMode: Immediate
# 持久卷声明 apiVersion: v1 kind: PersistentVolumeClaim metadata: name: database-pvc spec: accessModes: - ReadWriteOnce storageClassName: ceph-rbd resources: requests: storage: 100Gi
# 命名空间管理员 apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: development name: namespace-admin rules: - apiGroups: ["*"] resources: ["*"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: namespace-admin-binding namespace: development subjects: - kind: User name: alice apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: namespace-admin apiGroup: rbac.authorization.k8s.io
# Pod安全策略 apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted spec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities: - ALL volumes: - "configMap" - "emptyDir" - "projected" - "secret" - "downwardAPI" runAsUser: rule: "MustRunAsNonRoot" seLinux: rule: "RunAsAny" fsGroup: rule: "RunAsAny"
# 限制Pod间通信 apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all spec: podSelector: {} policyTypes: - Ingress - Egress --- # 允许特定流量 apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-web spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 80
# Prometheus部署 apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config data: prometheus.yml: | global: scrape_interval: 15s scrape_configs: - job_name: "kubernetes-pods" kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true
# Fluent Bit配置 apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config data: fluent-bit.conf: | [SERVICE] Flush 5 Daemon off Log_Level info [INPUT] Name tail Path /var/log/containers/*.log Parser docker Tag kube.* Refresh_Interval 5 [OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200 Logstash_Format on
# 使用Velero备份 velero install \\ --provider aws \\ --plugins velero/velero-plugin-for-aws:v1.5.0 \\ --bucket velero \\ --secret-file ./credentials-velero \\ --use-volume-snapshots=true # 备份整个集群 velero backup create full-cluster-backup --include-cluster-resources # 恢复 velero restore create --from-backup full-cluster-backup
通过系统化的部署和配置,可以构建高可用、安全可靠的Kubernetes生产环境。