This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/24/2023, 12:01 AM

This message was deleted.

wonderful-rain-13345

03/24/2023, 12:08 AM

Using k3s on ubuntu. Nothing special

hundreds-evening-84071

03/24/2023, 12:09 AM

Is the 1st node coming up? If have not already looked at this:

journalctl -u k3s.service

wonderful-rain-13345

03/24/2023, 12:10 AM

👀

wonderful-rain-13345

03/24/2023, 12:12 AM

rancher says the cluster: "non-ready bootstrap machine(s) production-master-86-5fc57bd6c-c88m2 and join url to be available on bootstrap node"

wonderful-rain-13345

03/24/2023, 12:12 AM

cluster is "explorable" in rancher

creamy-pencil-82913

03/24/2023, 12:12 AM

check the logs on the node?

wonderful-rain-13345

03/24/2023, 12:12 AM

yeah nothing of note

creamy-pencil-82913

03/24/2023, 12:12 AM

are all of the pods running?

creamy-pencil-82913

03/24/2023, 12:13 AM

are there any errors in the cluster agent pod?

wonderful-rain-13345

03/24/2023, 12:13 AM

only 6 pods?

creamy-pencil-82913

03/24/2023, 12:14 AM

you’re gonna have to give me a little more to work with

wonderful-rain-13345

03/24/2023, 12:14 AM

no i know

wonderful-rain-13345

03/24/2023, 12:14 AM

:))

wonderful-rain-13345

03/24/2023, 12:16 AM

this is service log

k3s.log

wonderful-rain-13345

03/24/2023, 12:16 AM

i'm going to upgrade rancher too

creamy-pencil-82913

03/24/2023, 12:16 AM

is that the cattle-cluster-agent log?

creamy-pencil-82913

03/24/2023, 12:16 AM

it looks like its still starting up

wonderful-rain-13345

03/24/2023, 12:17 AM

the first one is the pod

creamy-pencil-82913

03/24/2023, 12:17 AM

is the pod failing and being restarted? you might check the --previous logs

wonderful-rain-13345

03/24/2023, 12:17 AM

no restarts

wonderful-rain-13345

03/24/2023, 12:18 AM

i'll let it st for a bit

wonderful-rain-13345

03/24/2023, 12:18 AM

i'm using kine with pgsl

wonderful-rain-13345

03/24/2023, 12:19 AM

i'll let it simmer for a bit

wonderful-rain-13345

03/24/2023, 12:21 AM

Thanks for checking, appreciated

creamy-pencil-82913

03/24/2023, 12:22 AM

ohhh hmm, is this an imported or provisioned cluster?

wonderful-rain-13345

03/24/2023, 12:22 AM

nope

wonderful-rain-13345

03/24/2023, 12:22 AM

sorry i was unclear

creamy-pencil-82913

03/24/2023, 12:22 AM

Did you provision it via rancher, or just import it?

wonderful-rain-13345

03/24/2023, 12:23 AM

I have an old k3os cluster i started that is running rancher (via helm), running rancher v2.7.1. I used that to deploy a basically plain ubuntu image on vmware via a template. using v1.24.10+k3s1

creamy-pencil-82913

03/24/2023, 12:25 AM

ok. so the cluster running rancher is on kine with postgres, and the provisioned cluster (the one that is stuck waiting) is a single-node cluster with etcd

creamy-pencil-82913

03/24/2023, 12:25 AM

is that correct?

wonderful-rain-13345

03/24/2023, 12:25 AM

yes, it's got etcd, worker, controller roles.

wonderful-rain-13345

03/24/2023, 12:26 AM

ran with

INSTALL_K3S_EXEC  = --disable-cloud-controller

creamy-pencil-82913

03/24/2023, 12:26 AM

ah well that might do it

creamy-pencil-82913

03/24/2023, 12:26 AM

wait you ran with that on the downstream cluster?

wonderful-rain-13345

03/24/2023, 12:27 AM

in Cluster Config, Agent Env Vars

creamy-pencil-82913

03/24/2023, 12:27 AM

why though

creamy-pencil-82913

03/24/2023, 12:27 AM

if you do

kubectl get node -o wide

on the downstream cluster, is the node NotReady?

wonderful-rain-13345

03/24/2023, 12:27 AM

How do you install vsphere cpi/csi on k3s?

creamy-pencil-82913

03/24/2023, 12:28 AM

Manually, since we don’t include any packaged cloud provider charts or the in-tree cloud providers

wonderful-rain-13345

03/24/2023, 12:28 AM

node is ready

creamy-pencil-82913

03/24/2023, 12:28 AM

I think that your disable flag didn’t take

creamy-pencil-82913

03/24/2023, 12:28 AM

which is probably good

wonderful-rain-13345

03/24/2023, 12:28 AM

yeah that was my understand that i had to disable CC so the CPI can be installed

wonderful-rain-13345

03/24/2023, 12:28 AM

because conflicts

wonderful-rain-13345

03/24/2023, 12:29 AM

conflicting port iirc

creamy-pencil-82913

03/24/2023, 12:29 AM

hmm so it’s showing as ready, and all the pods are ready, but the UI still shows it as waiting?

wonderful-rain-13345

03/24/2023, 12:29 AM

yep

creamy-pencil-82913

03/24/2023, 12:29 AM

can you show the output of

kubectl get pod -A -o wide

and

kubectl get node -o wide

wonderful-rain-13345

03/24/2023, 12:30 AM

Copy code

packerbuilt@production-master-86-649b8f71-5nqln:~$ kubectl get pod -A -o wide
NAMESPACE       NAME                                    READY   STATUS      RESTARTS   AGE   IP          NODE                                  NOMINATED NODE   READINESS GATES
kube-system     coredns-7b5bbc6644-qqpp8                1/1     Running     0          24m   10.42.0.4   production-master-86-649b8f71-5nqln   <none>           <none>
kube-system     metrics-server-667586758d-7gl5r         1/1     Running     0          24m   10.42.0.5   production-master-86-649b8f71-5nqln   <none>           <none>
cattle-system   cattle-cluster-agent-644ddf96b9-nj9vv   1/1     Running     0          24m   10.42.0.6   production-master-86-649b8f71-5nqln   <none>           <none>
kube-system     helm-install-traefik-crd-snd26          0/1     Completed   0          24m   10.42.0.3   production-master-86-649b8f71-5nqln   <none>           <none>
kube-system     helm-install-traefik-w5w5t              0/1     Completed   1          24m   10.42.0.2   production-master-86-649b8f71-5nqln   <none>           <none>
kube-system     traefik-64b96ccbcd-j5qdd                1/1     Running     0          23m   10.42.0.7   production-master-86-649b8f71-5nqln   <none>           <none>
packerbuilt@production-master-86-649b8f71-5nqln:~$ kubectl get node -o wide
NAME                                  STATUS   ROLES                  AGE   VERSION         INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
production-master-86-649b8f71-5nqln   Ready    control-plane,master   25m   v1.24.10+k3s1   172.16.1.252   <none>        Ubuntu 22.04.2 LTS   5.15.0-60-generic   <containerd://1.6.15-k3s1>

creamy-pencil-82913

03/24/2023, 12:31 AM

You’re missing the

etcd

role. Did you tweak anything else in your agent env vars?

creamy-pencil-82913

03/24/2023, 12:31 AM

Rancher only knows how to manage clusters that use embedded etcd. If you do something to tweak the K3s config so that it doesn’t use etcd, it will be very confused.

creamy-pencil-82913

03/24/2023, 12:32 AM

that doesn’t have the info I’m looking for, I think it’s in the cluster object

wonderful-rain-13345

03/24/2023, 12:33 AM

i unchecked klipper lb

wonderful-rain-13345

03/24/2023, 12:33 AM

Copy code

apiVersion: <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>
kind: Cluster
metadata:
  name: production
  annotations:
    <http://field.cattle.io/creatorId|field.cattle.io/creatorId>: user-ks4kc
#    key: string
  creationTimestamp: '2023-03-23T05:12:52Z'
  finalizers:
    - <http://wrangler.cattle.io/provisioning-cluster-remove|wrangler.cattle.io/provisioning-cluster-remove>
    - <http://wrangler.cattle.io/rke-cluster-remove|wrangler.cattle.io/rke-cluster-remove>
    - <http://wrangler.cattle.io/cloud-config-secret-remover|wrangler.cattle.io/cloud-config-secret-remover>
#    - string
  generation: 8
  labels:
    {}
#    key: string
  namespace: fleet-default
  resourceVersion: '28260918'
  uid: 8213564a-d541-4aea-9966-4c829c5f5e44
  fields:
    - production
    - 'true'
    - production-kubeconfig
spec:
  agentEnvVars:
    - name: K3S_DATASTORE_ENDPOINT
      value: <postgres://k3s:k3s@172.16.1.60:32768/k3s_production?sslmode=disable>
    - name: INSTALL_K3S_EXEC
      value: '--disable-cloud-controller'
    - name: K3S_KUBECONFIG_MODE
      value: '644'
#    - name: string
#      value: string
  cloudCredentialSecretName: cattle-global-data:cc-ps8zn
  defaultPodSecurityPolicyTemplateName: ''
  kubernetesVersion: v1.24.10+k3s1
  localClusterAuthEndpoint:
    caCerts: ''
    enabled: false
    fqdn: ''
  rkeConfig:
    chartValues:
      {}
    etcd:
      disableSnapshots: false
      s3:
        bucket: nrc-rancher
        cloudCredentialName: cattle-global-data:cc-g868h
        endpoint: <http://nyc3.digitaloceanspaces.com|nyc3.digitaloceanspaces.com>
        endpointCA: ''
        folder: production
        region: nyc3
        skipSSLVerify: false
      snapshotRetention: 5
      snapshotScheduleCron: 0 */5 * * *
    etcdSnapshotCreate:
      generation: 1
    machineGlobalConfig:
      disable:
        - servicelb
        - local-storage
      disable-apiserver: false
      disable-cloud-controller: false
      disable-controller-manager: false
      disable-etcd: false
      disable-kube-proxy: false
      disable-network-policy: false
      disable-scheduler: false
      etcd-expose-metrics: false
      secrets-encryption: false
    machinePools:
      - controlPlaneRole: true
        etcdRole: true
        machineConfigRef:
          kind: VmwarevsphereConfig
          name: nc-production-master-86-rhpfw
        machineOS: linux
        name: master-86
        quantity: 1
        unhealthyNodeTimeout: 0s
        workerRole: true
      - controlPlaneRole: true
        etcdRole: true
        machineConfigRef:
          kind: VmwarevsphereConfig
          name: nc-production-master-93-hs8s4
        machineOS: linux
        name: master-93
        quantity: 0
        unhealthyNodeTimeout: 0s
        workerRole: true
      - machineConfigRef:
          kind: VmwarevsphereConfig
          name: nc-production-worker-86-b4s76
        machineOS: linux
        name: worker-86
        quantity: 0
        unhealthyNodeTimeout: 0s
        workerRole: true
      - machineConfigRef:
          kind: VmwarevsphereConfig
          name: nc-production-worker-93-sx4p6
        machineOS: linux
        name: worker-93
        quantity: 0
        unhealthyNodeTimeout: 0s
        workerRole: true
#      - cloudCredentialSecretName: string
#        controlPlaneRole: boolean
#        displayName: string
#        drainBeforeDelete: boolean
#        drainBeforeDeleteTimeout: string
#        etcdRole: boolean
#        labels:
#          key: string
#        machineConfigRef:
#          apiVersion: string
#          fieldPath: string
#          kind: string
#          name: string
#          namespace: string
#          resourceVersion: string
#          uid: string
#        machineDeploymentAnnotations:
#          key: string
#        machineDeploymentLabels:
#          key: string
#        machineOS: string
#        maxUnhealthy: string
#        name: string
#        nodeStartupTimeout: string
#        paused: boolean
#        quantity: int
#        rollingUpdate:
#          maxSurge: string
#          maxUnavailable: string
#        taints:
#          - effect: string
#            key: string
#            timeAdded: string
#            value: string
#        unhealthyNodeTimeout: string
#        unhealthyRange: string
#        workerRole: boolean
    machineSelectorConfig:
      - config:
          docker: false
          protect-kernel-defaults: false
          selinux: false
#      - config:
#        
#        machineLabelSelector:
#          matchExpressions:
#            - key: string
#              operator: string
#              values:
#                - string
#          matchLabels:
#            key: string
    registries:
      configs:
        {}
#        authConfigSecretName: string
#        caBundle: string
#        insecureSkipVerify: boolean
#        tlsSecretName: string
      mirrors:
        {}
#        endpoint:
#          - string
#        rewrite:
#          key: string
    upgradeStrategy:
      controlPlaneConcurrency: '1'
      controlPlaneDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: false
        force: false
        gracePeriod: -1
        ignoreDaemonSets: true
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 120
#        ignoreErrors: boolean
#        postDrainHooks:
#          - annotation: string
#        preDrainHooks:
#          - annotation: string
      workerConcurrency: '1'
      workerDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: false
        force: false
        gracePeriod: -1
        ignoreDaemonSets: true
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 120
#        ignoreErrors: boolean
#        postDrainHooks:
#          - annotation: string
#        preDrainHooks:
#          - annotation: string
#    additionalManifest: string
#    etcdSnapshotRestore:
#      generation: int
#      name: string
#      restoreRKEConfig: string
#    infrastructureRef:
#      apiVersion: string
#      fieldPath: string
#      kind: string
#      name: string
#      namespace: string
#      resourceVersion: string
#      uid: string
#    provisionGeneration: int
#    rotateCertificates:
#      generation: int
#      services:
#        - string
#    rotateEncryptionKeys:
#      generation: int
  machineSelectorConfig:
    - config: {}
#  clusterAPIConfig:
#    clusterName: string
#  defaultClusterRoleForProjectMembers: string
#  enableNetworkPolicy: boolean
#  redeploySystemAgentGeneration: int
__clone: true

creamy-pencil-82913

03/24/2023, 12:34 AM

ohhhhhh you got it to pass through the K3S_DATASTORE_ENDPOINT variable to the agent

creamy-pencil-82913

03/24/2023, 12:34 AM

Yeah that’s not supported. as I said above Rancher only supports using embedded etcd

wonderful-rain-13345

03/24/2023, 12:34 AM

yea

wonderful-rain-13345

03/24/2023, 12:34 AM

heh how'd it work before lol

creamy-pencil-82913

03/24/2023, 12:34 AM

if you’re OK with seeing the warning you can use it as-is but it will probably be very confused.

wonderful-rain-13345

03/24/2023, 12:34 AM

yeah it refuses to join workers

creamy-pencil-82913

03/24/2023, 12:35 AM

yep

creamy-pencil-82913

03/24/2023, 12:35 AM

All of the provisioning stuff expects for there to be etcd roles and join info available

wonderful-rain-13345

03/24/2023, 12:35 AM

so what's the story with kine? in my mental model i thought i'd save me from when nodes die. but apparently i need the join token + the DB

creamy-pencil-82913

03/24/2023, 12:36 AM

It works well if you have a highly available external DB. It does make your server nodes essentially disposable as long as you still have the DB and token.

wonderful-rain-13345

03/24/2023, 12:36 AM

that's good 😅

creamy-pencil-82913

03/24/2023, 12:36 AM

But Rancher doesn’t support it, because we didn’t want to have to teach it how to do that

wonderful-rain-13345

03/24/2023, 12:36 AM

hmm so do i just not use the etcd role?

creamy-pencil-82913

03/24/2023, 12:36 AM

RKE1 and RKE2 both support only etcd, so the support for K3s also only supports embedded etcd

creamy-pencil-82913

03/24/2023, 12:36 AM

You just can’t point it at an external DB.

creamy-pencil-82913

03/24/2023, 12:37 AM

Blow away that node/cluster and build a new one without trying to point it at an external DB

wonderful-rain-13345

03/24/2023, 12:37 AM

embedded etcd is real etcd? or the one i read about with sql lite that is a default in k3s?

creamy-pencil-82913

03/24/2023, 12:38 AM

kine without an external db is sqlite

creamy-pencil-82913

03/24/2023, 12:38 AM

embedded etcd is real etcd

wonderful-rain-13345

03/24/2023, 12:38 AM

bundled into k3s?

creamy-pencil-82913

03/24/2023, 12:38 AM

yes

wonderful-rain-13345

03/24/2023, 12:38 AM

that env var switches k3s to use kine, right?

creamy-pencil-82913

03/24/2023, 12:38 AM

that env var tells it to use kine with an external database, yeah

creamy-pencil-82913

03/24/2023, 12:39 AM

you did the right thing and it is a great hack that I didn’t think would work

creamy-pencil-82913

03/24/2023, 12:39 AM

but Rancher just doesn’t expect it

wonderful-rain-13345

03/24/2023, 12:39 AM

(i realize how complicated this machinery is and I'm aware i'm super simplifying it and asking questions that are in "it depends" / "it's complicated" territory)

wonderful-rain-13345

03/24/2023, 12:40 AM

wonderful-rain-13345

03/24/2023, 12:40 AM

so i guess my move here is don't use kine, and just snapshot that cluster's etcd.

wonderful-rain-13345

03/24/2023, 12:40 AM

and if things go horribly wrong, just restore from snapshot?

wonderful-rain-13345

03/24/2023, 12:41 AM

I really like rancher and k3s, but i will say i've been frustrated by a seemingly lack of guidance around how to prep images (i.e. what is needed for k8s vs k3s).

wonderful-rain-13345

03/24/2023, 12:42 AM

It seems like a DR plan would include backing up rancher's etcd + my cluster's etcd? (gitops ci/cd aside)

creamy-pencil-82913

03/24/2023, 12:42 AM

for DR I would just recommend backing up the token and setting up etcd backups to S3

wonderful-rain-13345

03/24/2023, 12:42 AM

i seem to always lose my etcds on all my clusters 😄

creamy-pencil-82913

03/24/2023, 12:43 AM

from that you should be able to restore the cluster

wonderful-rain-13345

03/24/2023, 12:43 AM

wonderful-rain-13345

03/24/2023, 12:43 AM

i wonder how it worked before

creamy-pencil-82913

03/24/2023, 12:44 AM

if you didn’t try to add more workers maybe it didn’t care that the join URL wasn’t available?

wonderful-rain-13345

03/24/2023, 12:44 AM

in the last cluster, i was churning workers

wonderful-rain-13345

03/24/2023, 12:44 AM

I had 3 masters, and was churning them workers hard.

creamy-pencil-82913

03/24/2023, 12:45 AM

I am honestly not sure what specifically it is looking for, I just know that the provisioning code only works with etcd. We talked about supporting external SQL DBs for provisioned clusters but it was removed from scope.

wonderful-rain-13345

03/24/2023, 12:45 AM

The problem occurred when i accidently scaled down master to 0, but rancher didn't kill the last master (for safety?) but it wouldn't let more join, or rather it did but wouldn't reflect in the rancher UI. Then workers wouldn't join. Could only add masters

wonderful-rain-13345

03/24/2023, 12:49 AM

heh, i guess this works out

wonderful-rain-13345

03/24/2023, 12:49 AM

can i still pass

--disable-servicelb

wonderful-rain-13345

03/24/2023, 12:50 AM

--disable servicelb

creamy-pencil-82913

03/24/2023, 12:51 AM

yeah that should be fine

wonderful-rain-13345

03/24/2023, 12:52 AM

iirc i think the former is deprecated

wonderful-rain-13345

03/24/2023, 12:52 AM

Thanks a lot Brandon, much appreciated!

creamy-pencil-82913

03/24/2023, 12:54 AM

yeah sorry, you want --disable=servicelb not --disable-servicelb

creamy-pencil-82913

03/24/2023, 12:54 AM

we have some --disable-x flags and also --disable=x,y

wonderful-rain-13345

03/24/2023, 12:55 AM

i'll try with the UI check box 😄

creamy-pencil-82913

03/24/2023, 12:55 AM

depending on whether you want to disable a packaged manifest, or disable a core controller

wonderful-rain-13345

03/24/2023, 12:56 AM

ahh yeah, i wanna to use metallb instead of klipper. And when you install longhorn on k3s with local storage enabled, every time a new node starts up (maybe only masters?) it sets the local-storage class to default, even if Longhorn is the default and local-storage's default flag had been cleared, which breaks stuff

creamy-pencil-82913

03/24/2023, 12:58 AM

yep 😕

creamy-pencil-82913

03/24/2023, 12:59 AM

gotta disable local-storage as well

creamy-pencil-82913

03/24/2023, 12:59 AM

or be explicit about the StorageClassName on your PVCs, which I personally prefer

wonderful-rain-13345

03/24/2023, 12:59 AM

yeah, gets tricky with helm charts. They aren't all well written

wonderful-rain-13345

03/24/2023, 1:00 AM

I saw there was nfs support in k3s tree? Did that mean I could mount an NFS pvc to a pod without a separate CSI?

polite-piano-74233

03/24/2023, 1:22 AM

thats native in kubernetes so yea, you can just point the container volume at an nfs share directly

polite-piano-74233

03/24/2023, 1:22 AM

also i learned a lot from this thread 😄

wonderful-rain-13345

03/24/2023, 1:22 AM

nice thanks @polite-piano-74233

creamy-pencil-82913

03/24/2023, 1:29 AM

I like the NFS subdir provisioner too, it'll give you CSI PVs that are just a subdirectory off a base export. Handles cleanup and everything.

wonderful-rain-13345

03/24/2023, 1:30 AM

yeah i've used previously

wonderful-rain-13345

03/24/2023, 1:30 AM

i'm on esx 6.7, can't goto 7 really so the vsphere csi is kinda limited for me. Longhorn seems like a silver bullet.

6 Views

Open in Slack

Previous Next