问题描述

  • kubekey部署Kubernetes集群时,高可用方案可以选择使用kubeVip实现。

  • 其原理类似于keepalived+lvs,在每个master node上跑一个pod,通过leader election实现vip的绑定和故障转移。

  • 同时类似于kube-proxy的service的实现,用ipvs实现apiserver的负载均衡。

  • 某次部署后发现,api server的ipvs负载均衡并没有生效。

问题复现

  1. 使用kubekey 3.0.6(3a6a0b060299b584a0ca5bf36a19704284cbb5a2)安装k8s,控制面使用kubevip做高可用。具体的集群安装配置文件如下

     apiVersion: kubekey.kubesphere.io/v1alpha2
     kind: Cluster
     metadata:
       name: lab
     spec:
       hosts:
         - {
             name: master1,
             address: 10.0.0.161,
             internalAddress: 10.0.0.161,
             user: root,
             privateKeyPath: "~/.ssh/id_ed25519",
           }
         - {
             name: master2,
             address: 10.0.0.149,
             internalAddress: 10.0.0.149,
             user: root,
             privateKeyPath: "~/.ssh/id_ed25519",
           }
         - {
             name: master3,
             address: 10.0.0.216,
             internalAddress: 10.0.0.216,
             user: root,
             privateKeyPath: "~/.ssh/id_ed25519",
           }
       roleGroups:
         etcd:
           - master1
           - master2
           - master3
         control-plane:
           - master1
           - master2
           - master3
         worker:
           - master1
           - master2
           - master3
       controlPlaneEndpoint:
         internalLoadbalancer: kube-vip
         domain: lb.kubesphere.local
         address: "10.0.0.200"
         port: 6443
       kubernetes:
         type: kubernetes
         version: v1.21.14
         imageRepo: kubesphere
         containerManager: docker
         clusterName: cluster.local
       network:
         plugin: calico
         kubePodsCIDR: "10.233.64.0/18"
         kubeServiceCIDR: "10.233.0.0/18"
  2. 集群安装完后,可以看到kubeVip的manifest里开启了负载均衡

       containers:
         - name: kube-vip
           image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-vip:v0.5.0
           args:
             - manager
           env:
           # 开启基于lvs的负载均衡
             - name: lb_enable
               value: 'true'
             - name: lb_port
               value: '6443'
  3. 根据kubeVip文档,使用ipvsadm相关命令验证vip所在节点的ipvs规则,发现只能看到k8s维护的ipvs规则,并没有kubeVip的ipvs规则

  4. 查看kubeVip的日志可以看到一直在尝试创建ipvs规则,但是貌似一直没持久化下来

     time="2023-01-12T02:54:15Z" level=error msg="Error querying backends file does not exist"
     time="2023-01-12T02:54:15Z" level=info msg="Created Load-Balancer services on [10.0.0.200:6443]"
     time="2023-01-12T02:54:15Z" level=info msg="Added backend for [10.0.0.200:6443] on [10.0.0.216:6443]"
     time="2023-01-12T02:55:15Z" level=error msg="Error querying backends file does not exist"
     time="2023-01-12T02:55:15Z" level=info msg="Created Load-Balancer services on [10.0.0.200:6443]"
     time="2023-01-12T02:55:15Z" level=info msg="Added backend for [10.0.0.200:6443] on [10.0.0.149:6443]"
     time="2023-01-12T02:56:37Z" level=error msg="Error querying backends file does not exist"
     time="2023-01-12T02:56:37Z" level=info msg="Created Load-Balancer services on [10.0.0.200:6443]"
     time="2023-01-12T02:56:37Z" level=info msg="Added backend for [10.0.0.200:6443] on [10.0.0.161:6443]"
     time="2023-01-12T02:59:16Z" level=error msg="Error querying backends file does not exist"
     time="2023-01-12T02:59:16Z" level=info msg="Created Load-Balancer services on [10.0.0.200:6443]"

定位分析

  1. kubeVip相关代码

     // pkg/cluster/clusterLeaderElection.go:276
     // nodeWatcher中为node IP增加ipvs规则
     if node.Status.Addresses[x].Type == v1.NodeInternalIP {
                         err = lb.AddBackend(node.Status.Addresses[x].Address, port)
                         if err != nil {
                             log.Errorf("add IPVS backend [%v]", err)
                         }
                     }
     // pkg/loadbalancer/ipvs.go:101
     func (lb *IPVSLoadBalancer) AddBackend(address string, port int) error {
    
         // Check if this is the first backend
         backends, err := lb.client.Destinations(lb.loadBalancerService)
         if err != nil && strings.Contains(err.Error(), "file does not exist") {
             log.Errorf("Error querying backends %s", err)
         }
         // If this is our first backend, then we can create the load-balancer service and add a backend
         if len(backends) == 0 {
         // 创建ipvs规则
             err = lb.client.CreateService(lb.loadBalancerService)
         ...
       }
    
         dst := ipvs.Destination{
             Address:   ipvs.NewIP(net.ParseIP(address)),
             Port:      uint16(port),
             Family:    ipvs.INET,
             Weight:    1,
             FwdMethod: lb.forwardingMethod,
         }
         // 为ipvs规则增加后端服务
         err = lb.client.CreateDestination(lb.loadBalancerService, dst)
  2. 通过上述代码可以看到kubeVip使用了cloudflare的ipvs库,实现对OS ipvs的增删改查类操作。看起来并没有什么问题

  3. 动手写test,在master上验证一下直接用cloudflare的ipvs库创建ipvs service,看看是否有效

     func TestCreateIPVS(t *testing.T) {
         c, e := ipvs.New()
         if e != nil {
             log.Fatalf("err create client %v", e)
         }
    
         newService := ipvs.Service{
             Address:   ipvs.NewIP(net.ParseIP("123.0.0.1")),
             Netmask:   ipvs.IPMask{},
             Scheduler: ROUNDROBIN,
             Port:      uint16(23456),
             Family:    ipvs.INET,
             Protocol:  ipvs.TCP,
         }
         e = c.CreateService(newService)
         if e != nil {
             log.Errorf("err create service %v", e)
             return
         }
         newDestination := ipvs.Destination{
             Address:   ipvs.NewIP(net.ParseIP("123.0.0.2")),
             FwdMethod: ipvs.Masquarade,
             Weight:    1,
             Port:      uint16(345),
             Family:    ipvs.INET,
         }
         err := c.CreateDestination(newService, newDestination)
         if err != nil {
             log.Errorf("err create destination %v", e)
             return
         }
     }

    实际验证发现,用上述代码在master上创建的ipvs规则,有时候可以看到,有时候又看不到

  4. 结合之前对kube-proxy iptables规则的维护方面的了解,想到可能kubernetes对ipvs规则也有清理的动作。所以接下来看下kube-proxy代码

  5. 先搜一篇博客了解下大致的kube-proxy代码结构,果然在其中看到了和清理规则相关的字段,不过该博客基于1.19分析,后面看1.21的代码这块有所变更(ProxyServer结构体中的CleanupIPVS字段已经被移除了)

  6. kube-proxy相关代码如下(2fef630dd216ddefd051ef5a2dda3fe1fdf7439a),代码中会对旧的ipvs规则进行清除,但是会排除掉Proxier.excludeCIDRs中的ipvs规则

     type Proxier struct {
       ...
         // Values are CIDR's to exclude when cleaning up IPVS rules.
         excludeCIDRs []*net.IPNet
       ...
     }
    
     // pkg/proxy/ipvs/proxier.go:1030
     func (proxier *Proxier) syncProxyRules() {
       ...
       // Clean up legacy IPVS services and unbind addresses
         appliedSvcs, err := proxier.ipvs.GetVirtualServers()
         if err == nil {
             for _, appliedSvc := range appliedSvcs {
                 currentIPVSServices[appliedSvc.String()] = appliedSvc
             }
         } else {
             klog.Errorf("Failed to get ipvs service, err: %v", err)
         }
         proxier.cleanLegacyService(activeIPVSServices, currentIPVSServices, legacyBindAddrs)
       ...
     }

尝试解决

通过代码分析,可以得到初步的猜想:kubeVip定时往ipvs里添加规则,kube-proxy定时清理ipvs规则,二者一个增一个删……

所以在kube-proxy的配置中,找到excludeCIDRs字段,将node所在的网段加入其中

ipvs:
  excludeCIDRs:
    - 10.0.0.0/24
  minSyncPeriod: 0s
  scheduler: ""

重启kube-proxy后,可以看到kubeVip日志中不再循环尝试增加ipvs规则,且vip所在的node上也能够看到ipvs规则了

Every 2.0s: ipvsadm -Ln                                                            master3: Thu Jan 12 14:04:05 2023

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.200:6443 rr
  -> 10.0.0.149:6443              Local   1      0          1
  -> 10.0.0.161:6443              Local   1      0          2
  -> 10.0.0.216:6443              Local   1      0          2
// 正常后的kubeVip日志
time="2023-01-12T02:59:16Z" level=info msg="Added backend for [10.0.0.200:6443] on [10.0.0.216:6443]"
time="2023-01-12T03:00:16Z" level=info msg="Added backend for [10.0.0.200:6443] on [10.0.0.149:6443]"
time="2023-01-12T03:01:38Z" level=info msg="Added backend for [10.0.0.200:6443] on [10.0.0.161:6443]"

综合解决方案

为了从源头上解决该问题:

  1. 考虑对kubekey部署集群时的kube-proxy配置逻辑进行优化
  2. 另外应该在kubeVip项目的官方文档中增加明确提示,告知用户该负载均衡功能的启用应当配合修改kube-proxy配置
  3. 经过和社区维护者的沟通,选择了第二种方式,在使用配置说明中提示用户:如果同时开启了kube-vip和ipvs模式,则需要在kubeProxyConfiguration中将host所在的网段cidr加入excludeCIDRs中,详情可以参考社区issue相关PR

参考

文章目录