https://rancher.com/ logo
Title
l

loud-helmet-97067

02/15/2023, 12:01 PM
Hi Team, We have setup provision x2 master nodes (with etcd datastore - https://docs.k3s.io/installation/ha-embedded*)* and x4 worker nodes for our cluster and the k3s were provision using Rancher UI > Custom cluster settings. At the moment all the servers are up running, but when we try check HA availability by shutting down we get following errors when we issue kubectl commands; • Error from server (ServiceUnavailable): apiserver not ready • Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) Our requirement is to setup HA for control plane (master node). I.e: if one control plane server goes down / not available, Kubenetes API /CLI commands should run from other (available) node(s). If we are to achieve our requirement; • Via etcd datastote, Do we need setup x3 master nodes - enabling control plane and etcd in all 3 ? • If not use external datastore : https://docs.k3s.io/installation/ha If enabling external db, is it sufficient to pass --datastore-endpoint when provisioning master node via rancher Below state k3s configuration for master nodes; Master Node 0 [Sample ip: x.x.x.0]:
cat /etc/rancher/k3s/config.yaml.d/50-rancher.yaml
{
  "agent-token": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "disable-apiserver": false,
  "disable-cloud-controller": false,
  "disable-controller-manager": false,
  "disable-etcd": false,
  "disable-kube-proxy": false,
  "disable-network-policy": false,
  "disable-scheduler": false,
  "docker": false,
  "etcd-expose-metrics": false,
  "etcd-snapshot-retention": 5,
  "etcd-snapshot-schedule-cron": "0 */5 * * *",
  "kube-controller-manager-arg": [
    "cert-dir=/var/lib/rancher/k3s/server/tls/kube-controller-manager",
    "secure-port=10257"
  ],
  "kube-scheduler-arg": [
    "cert-dir=/var/lib/rancher/k3s/server/tls/kube-scheduler",
    "secure-port=10259"
  ],
  "node-label": [
    "<http://cattle.io/os=linux|cattle.io/os=linux>",
    "<http://rke.cattle.io/machine=b89290bb-5f82-47e7-96bc-9cc16f126a5c|rke.cattle.io/machine=b89290bb-5f82-47e7-96bc-9cc16f126a5c>"
  ],
  "node-taint": [
    "<http://node-role.kubernetes.io/control-plane:NoSchedule|node-role.kubernetes.io/control-plane:NoSchedule>",
    "<http://node-role.kubernetes.io/etcd:NoExecute|node-role.kubernetes.io/etcd:NoExecute>"
  ],
  "private-registry": "/etc/rancher/k3s/registries.yaml",
  "protect-kernel-defaults": false,
  "secrets-encryption": false,
  "selinux": false,
  "server": "<https://x.x.x.1:6443>",
  "token": "YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
Master Node 1 [Sample ip: x.x.x.1]:
cat /etc/rancher/k3s/config.yaml.d/50-rancher.yaml
{
  "agent-token": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "cluster-init": true,
  "disable-apiserver": false,
  "disable-cloud-controller": false,
  "disable-controller-manager": false,
  "disable-etcd": false,
  "disable-kube-proxy": false,
  "disable-network-policy": false,
  "disable-scheduler": false,
  "docker": false,
  "etcd-expose-metrics": false,
  "etcd-snapshot-retention": 5,
  "etcd-snapshot-schedule-cron": "0 */5 * * *",
  "kube-controller-manager-arg": [
    "cert-dir=/var/lib/rancher/k3s/server/tls/kube-controller-manager",
    "secure-port=10257"
  ],
  "kube-scheduler-arg": [
    "cert-dir=/var/lib/rancher/k3s/server/tls/kube-scheduler",
    "secure-port=10259"
  ],
  "node-label": [
    "<http://cattle.io/os=linux|cattle.io/os=linux>",
    "<http://rke.cattle.io/machine=77f5f3c6-a380-48b0-8b74-c7c3da330ff6|rke.cattle.io/machine=77f5f3c6-a380-48b0-8b74-c7c3da330ff6>"
  ],
  "node-taint": [
    "<http://node-role.kubernetes.io/control-plane:NoSchedule|node-role.kubernetes.io/control-plane:NoSchedule>",
    "<http://node-role.kubernetes.io/etcd:NoExecute|node-role.kubernetes.io/etcd:NoExecute>"
  ],
  "private-registry": "/etc/rancher/k3s/registries.yaml",
  "protect-kernel-defaults": false,
  "secrets-encryption": false,
  "selinux": false,
  "token": "YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
}
On testing the issue when Master Node1 goes down. When Master Node0 goes down, sometimes kubectl works (not 100%). We assume kube-api-server and related pods are distributed among both Master nodes alng with etcd datastore sync when w provision from Rancher UI Any insights/feedback on how to correctly achieve HA for Rancher UI provisioned k3s when one of master node goes down is highly appreciated.
r

rich-cartoon-70161

02/15/2023, 1:29 PM
To run K3s in this mode, you must have an odd number of server nodes. We recommend starting with three nodes.
Because the majority has to still be available, otherwise it’ll do what it’s doing in your setup.
With 3 control-plane nodes you can have 1 offline
l

loud-helmet-97067

02/15/2023, 2:20 PM
Thanks @rich-cartoon-70161 for the insights. So i assume that setting 3rd master node with --etcd and --controlplane is enough to set and trigger HA behaviour with service up when 1 master node goes down. Will check and confirm on this
r

rich-cartoon-70161

02/15/2023, 5:26 PM
Yes that should work.
l

loud-helmet-97067

02/16/2023, 1:17 AM
Noted with thanks. If we are to to use external datastore ( https://docs.k3s.io/installation/ha) , is appending --etcd --controlplane with --datastore-endpoint is enough? or external-datastore need to set separately
h

handsome-jewelry-34280

03/18/2023, 3:46 AM
Hi @loud-helmet-97067, where you able to resolver the correct way to pass the external datastore to a Rancher created K3S cluster?