功能简介

为了搭建高可用的kubernetes集群,控制面组件需要多节点多副本,有状态的组件通过leader election机制实现多副本正常协调工作,而api server这一无状态的组件需要借助vip或一些负载均衡机制来实现高可用。

kubekey支持借助kubeVip实现apiserver的高可用。

流程分析

kubekey基于kubeadm拉起k8s集群,结合kubeVip后,需要特定的流程实现高可用集群的部署:

  1. 完成节点运行时、etcd的安装和正常启动

  2. 同步k8s二进制组件到节点,为kubelet增加可执行权限,配置kubelet systemd unit及开机自启,配置kubelet启动参数

  3. 在第一个master上的static pod目录放置kubeVip manifest

  4. 生成kubeadm配置,其中和高可用相关的配置是controlPlaneEndpoint: lb.kubesphere.local:6443,该配置将apiserver的地址配置为域名,域名通过hosts指向了由kubeVip负责打通的vip

  5. 调用kubeadm init初始化第一个master。此时kubelet启动后,会拉起kubeVip的static pod,从而使vip可用,这样第一个master就可以顺利被初始化

  6. 然后在其他节点上生成kubeadm的配置,并运行kubadm join,加入第一个master生成的集群。同样,这个过程中都使用vip作为集群apiserver的连接方式

  7. 再次生成kubeVip manifest,但是这次不再局限于第一个master。所有的master节点都将以static pod的形式部署kubeVip的pod,实现kubeVip自身的高可用

  8. kubeVip多个pod如何实现高可用:查看kubeVip文档,pod的env里配置vip_leaderelection=true即可开启leader election

显然,部署集群的流程比上述流程要复杂的多,以上流程只摘取了和控制面endpoint及kubeVip相关的关键流程。

高可用验证

  1. 关掉三台master之一(非kubeVip的leader),模拟某台master宕机。此时发现控制面仍然那可以正常工作,且通过kubeVip pod日志可以看到,原有leader仍然不变,只是踢除了宕机master上受影响的服务

     root@master1:~# kubectl get nodes
     NAME      STATUS     ROLES                         AGE    VERSION
     master1   Ready      control-plane,master,worker   168m   v1.21.14
     master2   NotReady   control-plane,master,worker   168m   v1.21.14
     master3   Ready      control-plane,master,worker   168m   v1.21.14

    将关掉的机器重新开启,模拟故障机器恢复后,node恢复正常

     root@master1:~# kubectl get nodes
     NAME      STATUS   ROLES                         AGE     VERSION
     master1   Ready    control-plane,master,worker   3h39m   v1.21.14
     master2   Ready    control-plane,master,worker   3h39m   v1.21.14
     master3   Ready    control-plane,master,worker   3h39m   v1.21.14
  2. 关掉三台master之一,且该节点是kubeVip的leader所在节点,模拟kubeVip master宕机,验证leader切换

     root@master2:~# kubectl get nodes
     NAME      STATUS     ROLES                         AGE     VERSION
     master1   NotReady   control-plane,master,worker   3h43m   v1.21.14
     master2   Ready      control-plane,master,worker   3h42m   v1.21.14
     master3   Ready      control-plane,master,worker   3h42m   v1.21.14

    同样集群控制面和ipvs都能按照预期工作,查看kubeVip pod日志可以看到leader切换到了新的节点上的pod

负载均衡验证

预期:在vip所在的node上,由kube-vip设置ipvs service(ip为vip), 并将所有可用的master node的ip:6443作为backend加入ipvs service。

实际验证发现:

  1. vip所在的node上并不能看到预期的ipvs规则,只能看到kube-proxy维护的集群service的ipvs规则
  2. 查看kubeVip的pod日志发现,负载均衡模块在维护ipvs规则的过程中,不断循环检测ipvs规则失败->创建ipvs规则的流程
  3. 因此此处存在bug,单开文档详细分析

代码分析

代码梳理以kubekey代码的master分支的ec903fe13dfed73ffd3f72f4beec3123675ce4d0 commit为基准

以创建集群命令为例,首先根据配置里指定的集群类型,选择不同的pipeline

// cmd/kk/pkg/pipelines/create_cluster.go:279
switch runtime.Cluster.Kubernetes.Type {
    case common.K3s:
        if err := NewK3sCreateClusterPipeline(runtime); err != nil {
            return err
        }
    case common.K8e:
        if err := NewK8eCreateClusterPipeline(runtime); err != nil {
            return err
        }
    case common.Kubernetes:
        if err := NewCreateClusterPipeline(runtime); err != nil {
            return err
        }
    default:
        if err := NewCreateClusterPipeline(runtime); err != nil {
            return err
        }
    }

k8s集群pipeline包含以下module

// cmd/kk/pkg/pipelines/create_cluster.go:59
m := []module.Module{
        &precheck.GreetingsModule{},
        &precheck.NodePreCheckModule{},
        &confirm.InstallConfirmModule{},
        &artifact.UnArchiveModule{Skip: noArtifact},
        &os.RepositoryModule{Skip: noArtifact || !runtime.Arg.InstallPackages},
        &binaries.NodeBinariesModule{},
        &os.ConfigureOSModule{},
        &customscripts.CustomScriptsModule{Phase: "PreInstall", Scripts: runtime.Cluster.System.PreInstall},
        &kubernetes.StatusModule{},
        &container.InstallContainerModule{},
        &images.CopyImagesToRegistryModule{Skip: skipPushImages},
        &images.PullModule{Skip: runtime.Arg.SkipPullImages},
        &etcd.PreCheckModule{Skip: runtime.Cluster.Etcd.Type != kubekeyapiv1alpha2.KubeKey},
        &etcd.CertsModule{},
        &etcd.InstallETCDBinaryModule{Skip: runtime.Cluster.Etcd.Type != kubekeyapiv1alpha2.KubeKey},
        &etcd.ConfigureModule{Skip: runtime.Cluster.Etcd.Type != kubekeyapiv1alpha2.KubeKey},
        &etcd.BackupModule{Skip: runtime.Cluster.Etcd.Type != kubekeyapiv1alpha2.KubeKey},
        &kubernetes.InstallKubeBinariesModule{},
        &loadbalancer.KubevipModule{Skip: !runtime.Cluster.ControlPlaneEndpoint.IsInternalLBEnabledVip()},
        &kubernetes.InitKubernetesModule{},
        &dns.ClusterDNSModule{},
        &kubernetes.StatusModule{},
        &kubernetes.JoinNodesModule{},
        &loadbalancer.KubevipModule{Skip: !runtime.Cluster.ControlPlaneEndpoint.IsInternalLBEnabledVip()},
        &loadbalancer.HaproxyModule{Skip: !runtime.Cluster.ControlPlaneEndpoint.IsInternalLBEnabled()},
        &network.DeployNetworkPluginModule{},
        &kubernetes.ConfigureKubernetesModule{},
        &filesystem.ChownModule{},
        &certs.AutoRenewCertsModule{Skip: !runtime.Cluster.Kubernetes.EnableAutoRenewCerts()},
        &kubernetes.SecurityEnhancementModule{Skip: !runtime.Arg.SecurityEnhancement},
        &kubernetes.SaveKubeConfigModule{},
        &plugins.DeployPluginsModule{},
        &addons.AddonsModule{},
        &storage.DeployLocalVolumeModule{Skip: skipLocalStorage},
        &kubesphere.DeployModule{Skip: !runtime.Cluster.KubeSphere.Enabled},
        &kubesphere.CheckResultModule{Skip: !runtime.Cluster.KubeSphere.Enabled},
        &customscripts.CustomScriptsModule{Phase: "PostInstall", Scripts: runtime.Cluster.System.PostInstall},
    }

KubevipModule的Init函数中,初始化了这个module需要执行的具体action

// cmd/kk/pkg/loadbalancer/module.go:149
func (k *KubevipModule) Init() {
  ...
  checkVIPAddress := &task.RemoteTask{
        Name:     "CheckVIPAddress",
        Desc:     "Check VIP Address",
        Hosts:    k.Runtime.GetHostsByRole(common.Master),
        Prepare:  new(common.OnlyFirstMaster),
        Action:   new(CheckVIPAddress),
        Parallel: true,
    }

    getInterface := &task.RemoteTask{
        Name:     "GetNodeInterface",
        Desc:     "Get Node Interface",
        Hosts:    k.Runtime.GetHostsByRole(common.Master),
        Action:   new(GetInterfaceName),
        Parallel: true,
    }

    kubevipManifestOnlyFirstMaster := &task.RemoteTask{
        Name:     "GenerateKubevipManifest",
        Desc:     "Generate kubevip manifest at first master",
        Hosts:    k.Runtime.GetHostsByRole(common.Master),
        Prepare:  new(common.OnlyFirstMaster),
        Action:   new(GenerateKubevipManifest),
        Parallel: true,
    }
  ...
}

逐个action分析,首先看CheckVIPAddress action,主要逻辑是如果指定了kubeVip功能,则配置文件里的vip address不能为空

// cmd/kk/pkg/loadbalancer/tasks.go:148
func (c *CheckVIPAddress) Execute(runtime connector.Runtime) error {
    if c.KubeConf.Cluster.ControlPlaneEndpoint.Address == "" {
        return errors.New("VIP address is empty")
    } else {
        return nil
    }
}

接下来为GetInterfaceName action,主要逻辑是针对BGP模式和ARP模式分别设置节点上对应的网卡名称到host的cache中。如果是BGP模式,使用本地回环接口;否则使用ip route命令获取node ip对应的interface名称,最后塞到host的cache中

// cmd/kk/pkg/loadbalancer/tasks.go:160
func (g *GetInterfaceName) Execute(runtime connector.Runtime) error {
    host := runtime.RemoteHost()
    if g.KubeConf.Cluster.ControlPlaneEndpoint.KubeVip.Mode == "BGP" {
        host.GetCache().Set("interface", "lo")
        return nil
    }
    cmd := fmt.Sprintf("ip route "+
        "| grep ' %s ' "+
        "| sed -e \"s/^.*dev.//\" -e \"s/.proto.*//\"", host.GetAddress())
    interfaceName, err := runtime.GetRunner().SudoCmd(cmd, false)
    if err != nil {
        return err
    }
    if interfaceName == "" {
        return errors.New("get interface failed")
    }
    // type: string
    host.GetCache().Set("interface", interfaceName)
    return nil
}

最后一个action是GenerateKubevipManifest,主要完成基于kube-vip的yaml模板进行数据渲染,将变量填充至yaml模板中

// cmd/kk/pkg/loadbalancer/tasks.go:185
templateAction := action.Template{
        Template: templates.KubevipManifest,
        Dst:      filepath.Join(common.KubeManifestDir, templates.KubevipManifest.Name()),
        Data: util.Data{
            "BGPMode":      BGPMode,
            "VipInterface": interfaceName,
            "BGPRouterID":  host.GetAddress(),
            "BGPPeers":     BGPPeers,
            "KubeVip":      g.KubeConf.Cluster.ControlPlaneEndpoint.Address,
            "KubevipImage": images.GetImage(runtime, g.KubeConf, "kubevip").ImageName(),
        },
    }

在上文的流程分析中,我们提到在第3和第7步两次生成了kubeVip的pod manifest,对应到代码中使用的是同一个KubevipModule module,所以在module里必然是需要有逻辑区分kubeVip部署到第一个master还是所有master,对应的代码如下

// cmd/kk/pkg/loadbalancer/module.go:188
if exist, _ := k.BaseModule.PipelineCache.GetMustBool(common.ClusterExist); exist {
        k.Tasks = []task.Interface{
            checkVIPAddress,
            getInterface,
            kubevipManifestNotFirstMaster,
        }
    } else {
        k.Tasks = []task.Interface{
            checkVIPAddress,
            getInterface,
            kubevipManifestOnlyFirstMaster,
        }
    }

参考

文章目录