This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

09/25/2023, 8:29 PM

This message was deleted.

supportbundle_9042f568-6514-4a9f-a6c3-96a342641671_2023-09-25T20-13-07Z.zip

👋 1

great-bear-19718

09/26/2023, 12:52 AM

how many healthy master nodes do you have left when you are trying to add the new node?

quaint-alarm-7893

09/26/2023, 1:26 AM

quaint-alarm-7893

09/26/2023, 1:27 AM

it's also not promoting my 3rd node to master

quaint-alarm-7893

09/26/2023, 1:28 AM

@great-bear-19718 ^

quaint-alarm-7893

09/28/2023, 8:13 PM

@great-bear-19718 any chance you can give me a hand on this? i'd really get my cluster back to health w/ 3 masters.

great-bear-19718

10/02/2023, 10:31 PM

Copy code

(⎈|default:N/A)➜  ~ k get nodes
NAME           STATUS   ROLES                       AGE    VERSION
harvester-01   Ready    control-plane,etcd,master   374d   v1.24.7+rke2r1
harvester-03   Ready    control-plane,etcd,master   219d   v1.24.7+rke2r1
harvester-07   Ready    <none>                      200d   v1.24.7+rke2r1

is this the node you are trying to add?

great-bear-19718

10/02/2023, 10:31 PM

harvester-07

quaint-alarm-7893

10/02/2023, 10:31 PM

quaint-alarm-7893

10/02/2023, 10:32 PM

7 is the node that was already present, but it wont promote it to a master for some reason

quaint-alarm-7893

10/02/2023, 10:32 PM

so right now, i have a very-scary ha setup, w/ only 2 masters and 1 worker.

great-bear-19718

10/02/2023, 10:32 PM

so node got added to the cluster?

quaint-alarm-7893

10/02/2023, 10:32 PM

usually it just auto-promotes the 3rd machine when you delete a master

great-bear-19718

10/02/2023, 10:32 PM

just not getting promoted.. yeah.. it should

great-bear-19718

10/02/2023, 10:32 PM

let me check what is going on

quaint-alarm-7893

10/02/2023, 10:33 PM

lmk if you want a new bundle 🙂

quaint-alarm-7893

10/02/2023, 10:33 PM

the node i added was harvester-02-r or something like that (old one i lost was harvester-02)

great-bear-19718

10/02/2023, 10:34 PM

so what is

harvester-07

is that node marked ready?

great-bear-19718

10/02/2023, 10:34 PM

i dont see

harvester-02-r

in the support bundle

quaint-alarm-7893

10/02/2023, 10:35 PM

yeah. it's happy

great-bear-19718

10/02/2023, 10:35 PM

ideally 07 should have been promoted to master

quaint-alarm-7893

10/02/2023, 10:36 PM

02-r hasnt actually joined yet. it just sits at not ready

quaint-alarm-7893

10/02/2023, 10:37 PM

yeah, that's what i experience before but for some reason it wont. and i've reset the whole cluster and what-not too. so it's not a "try rebooting it" kinda fix

great-bear-19718

10/02/2023, 10:37 PM

on the node that is still not ready

great-bear-19718

10/02/2023, 10:37 PM

can you please check the output of

journalctl -fu rancherd

quaint-alarm-7893

10/02/2023, 10:38 PM

the one that wont join?

great-bear-19718

10/02/2023, 10:38 PM

yeah the one that wont join

quaint-alarm-7893

10/02/2023, 10:38 PM

k. give me a few, i have to spin it up. we just moved all our gear to a new suite in the building over the weekend.

quaint-alarm-7893

10/02/2023, 10:38 PM

brb

great-bear-19718

10/02/2023, 10:38 PM

the support bundle has no reference since it has obviously not joined this cluster but i see no reference to it

great-bear-19718

10/02/2023, 10:38 PM

no rush

quaint-alarm-7893

10/02/2023, 11:00 PM

heres a support bundle just incase

supportbundle_9042f568-6514-4a9f-a6c3-96a342641671_2023-10-02T22-37-39Z.zip

quaint-alarm-7893

10/02/2023, 11:00 PM

so yeah, that other node is fubar right now, i had to salvage a part after the move to get one of the others up.

quaint-alarm-7893

10/02/2023, 11:00 PM

any chance we can figure out why 7 is not promoting?

quaint-alarm-7893

10/02/2023, 11:01 PM

i have another machine i can try to bring in too if needed though

great-bear-19718

10/02/2023, 11:02 PM

ok let me check why 7 did not promote

quaint-alarm-7893

10/02/2023, 11:03 PM

thanks. i dont like running an ha w/ only 2 masters... i lose one, and it's a bad day.

great-bear-19718

10/02/2023, 11:21 PM

other 2 nodes have a topology setup

great-bear-19718

10/02/2023, 11:21 PM

harvester-07 does not

great-bear-19718

10/02/2023, 11:21 PM

i assume you defined the topology

quaint-alarm-7893

10/02/2023, 11:22 PM

ah. is that why?

quaint-alarm-7893

10/02/2023, 11:22 PM

i need to add a topology?

quaint-alarm-7893

10/02/2023, 11:24 PM

look at that. it's promoting! lol

quaint-alarm-7893

10/02/2023, 11:24 PM

geeze.

great-bear-19718

10/02/2023, 11:25 PM

👍

quaint-alarm-7893

10/02/2023, 11:25 PM

okay, i'll stand up another node really fast and let you know if it has issues joining again

quaint-alarm-7893

10/02/2023, 11:27 PM

can you help me with these:

quaint-alarm-7893

10/02/2023, 11:28 PM

i really want to upgrade to 1.1.2 but i dont want to till these are fixed.

great-bear-19718

10/02/2023, 11:49 PM

is that from the last upgrade?

quaint-alarm-7893

10/02/2023, 11:51 PM

yeah

quaint-alarm-7893

10/02/2023, 11:51 PM

i upped from 1.0 to 1.1.1

quaint-alarm-7893

10/02/2023, 11:51 PM

it's been lingering since i think

great-bear-19718

10/02/2023, 11:51 PM

Copy code

(⎈|default:harvester-system)➜  v1 k get deploy -n cattle-monitoring-system
NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
rancher-monitoring-grafana              0/1     1            0           375d

great-bear-19718

10/02/2023, 11:51 PM

grafana is not ready

quaint-alarm-7893

10/02/2023, 11:52 PM

also, looks like 7 is stuck promoting. it's still showing cordoned 😞

great-bear-19718

10/02/2023, 11:52 PM

there will be a promote job in the

harvester-system

ns that should have more info

great-bear-19718

10/02/2023, 11:52 PM

also output of

kubectl describe pod rancher-monitoring-grafana-787b587b6d-nqccz -n cattle-monitoring-system

it is stuck

great-bear-19718

10/02/2023, 11:52 PM

i'd like to know why its stuck

quaint-alarm-7893

10/02/2023, 11:54 PM

looks like it cant mount a pvc

quaint-alarm-7893

10/02/2023, 11:54 PM

should i try deleting it really fast so it can retry? or want to see the describe?

great-bear-19718

10/02/2023, 11:55 PM

yeah sure

great-bear-19718

10/02/2023, 11:55 PM

cant make it worse

quaint-alarm-7893

10/02/2023, 11:57 PM

for promote:

Copy code

Normal  Scheduled  31m   default-scheduler  Successfully assigned harvester-system/harvester-promote-harvester-07-qqnlp to harvester-07
  Normal  Pulled     31m   kubelet            Container image "busybox:1.32.0" already present on machine
  Normal  Created    31m   kubelet            Created container promote
  Normal  Started    31m   kubelet            Started container promote

quaint-alarm-7893

10/02/2023, 11:57 PM

want logs?

great-bear-19718

10/02/2023, 11:57 PM

yep

quaint-alarm-7893

10/02/2023, 11:58 PM

Copy code

E1002 16:57:58.119302    6624 memcache.go:255] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
deployment "rancher-webhook" successfully rolled out
<http://machine.cluster.x-k8s.io/custom-eb75b22fefcf|machine.cluster.x-k8s.io/custom-eb75b22fefcf> labeled
secret/custom-eb75b22fefcf-machine-plan labeled
<http://rkebootstrap.rke.cattle.io/custom-eb75b22fefcf|rkebootstrap.rke.cattle.io/custom-eb75b22fefcf> labeled
Waiting for promotion...
Waiting for promotion...
Waiting for promotion...

quaint-alarm-7893

10/02/2023, 11:58 PM

then it's waiting for promotion... A LOT

great-bear-19718

10/02/2023, 11:58 PM

kubectl get pods -n kube-system

quaint-alarm-7893

10/02/2023, 11:59 PM

everything is running want outptu?

quaint-alarm-7893

10/02/2023, 11:59 PM

output*

great-bear-19718

10/02/2023, 11:59 PM

yes please

quaint-alarm-7893

10/02/2023, 11:59 PM

Copy code

NAME                                                    READY   STATUS    RESTARTS         AGE
cloud-controller-manager-harvester-01                   1/1     Running   1661 (41h ago)   282d
cloud-controller-manager-harvester-03                   1/1     Running   24 (41h ago)     30d
etcd-harvester-01                                       1/1     Running   37 (41h ago)     282d
etcd-harvester-03                                       1/1     Running   6 (41h ago)      30d
harvester-whereabouts-h6mvg                             1/1     Running   21 (41h ago)     219d
harvester-whereabouts-mnx4v                             1/1     Running   36 (41h ago)     282d
harvester-whereabouts-rrwkg                             1/1     Running   17 (41h ago)     200d
kube-apiserver-harvester-01                             1/1     Running   1702 (41h ago)   282d
kube-apiserver-harvester-03                             1/1     Running   24 (41h ago)     30d
kube-controller-manager-harvester-01                    1/1     Running   1543 (41h ago)   282d
kube-controller-manager-harvester-03                    1/1     Running   38 (41h ago)     30d
kube-proxy-harvester-01                                 1/1     Running   37 (41h ago)     282d
kube-proxy-harvester-03                                 1/1     Running   23 (41h ago)     219d
kube-proxy-harvester-07                                 1/1     Running   18 (41h ago)     200d
kube-scheduler-harvester-01                             1/1     Running   56 (41h ago)     282d
kube-scheduler-harvester-03                             1/1     Running   11 (41h ago)     30d
rke2-canal-bzxwf                                        2/2     Running   34 (41h ago)     200d
rke2-canal-fkpnw                                        2/2     Running   99 (41h ago)     282d
rke2-canal-vx8m7                                        2/2     Running   43 (41h ago)     219d
rke2-coredns-rke2-coredns-58fd75f64b-jqgn6              1/1     Running   5 (41h ago)      14d
rke2-coredns-rke2-coredns-58fd75f64b-sjtnx              1/1     Running   5 (41h ago)      14d
rke2-coredns-rke2-coredns-autoscaler-768bfc5985-sw7kd   1/1     Running   5 (41h ago)      14d
rke2-ingress-nginx-controller-6mx6k                     1/1     Running   22 (41h ago)     219d
rke2-ingress-nginx-controller-dn5wj                     1/1     Running   36 (41h ago)     282d
rke2-ingress-nginx-controller-hht5t                     1/1     Running   22 (41h ago)     200d
rke2-metrics-server-5df44dfc84-28tx9                    1/1     Running   5 (41h ago)      14d
rke2-multus-ds-86zvx                                    1/1     Running   35 (41h ago)     282d
rke2-multus-ds-dxdwd                                    1/1     Running   21 (41h ago)     219d
rke2-multus-ds-kfl99                                    1/1     Running   17 (41h ago)     200d
snapshot-controller-7c4887cf-5rv67                      1/1     Running   7 (41h ago)      14d
snapshot-controller-7c4887cf-pggp6                      1/1     Running   14 (41h ago)     14d

great-bear-19718

10/02/2023, 11:59 PM

also

kubectl get machine -n fleet-local

quaint-alarm-7893

10/02/2023, 11:59 PM

ty for the help btw...

quaint-alarm-7893

10/02/2023, 11:59 PM

NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION custom-716cb3ba930e local harvester-01 rke2://harvester-01 Running 374d custom-955c2bcfc429 local Provisioning 7d5h custom-98b7fe6bc0be local harvester-03 rke2://harvester-03 Running 219d custom-eb75b22fefcf local harvester-07 rke2://harvester-07 Running 200d custom-f4c99741d9b4 local Provisioning 7d6h

great-bear-19718

10/03/2023, 12:01 AM

are you able to delete the two stuck in

Provisioning

? they are likely references for your old machine that fails the addition

great-bear-19718

10/03/2023, 12:01 AM

or we can do it later.. if you shell into

harvester-07

great-bear-19718

10/03/2023, 12:02 AM

there should be a

rancher-system-agent

running i need to see its logs

great-bear-19718

10/03/2023, 12:02 AM

journalctl -fu rancher-system-agent

quaint-alarm-7893

10/03/2023, 12:04 AM

machines deleted. here are longs from rancher-system-agent:

Copy code

rancher@harvester-07:~> journalctl -fu rancher-system-agent
-- Logs begin at Thu 2023-03-16 05:38:44 UTC. --
Oct 01 06:40:39 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:39Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + echo EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
Oct 01 06:40:39 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:39Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' -n ffb03c631d25480057e7bdad200aaf8835233029b9b271d0490e49198dd0b2aa ']'"
Oct 01 06:40:39 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:39Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + echo ffb03c631d25480057e7bdad200aaf8835233029b9b271d0490e49198dd0b2aa"
Oct 01 06:40:39 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:39Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + systemctl daemon-reload"
Oct 01 06:40:40 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:40Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' '' = true ']'"
Oct 01 06:40:40 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:40Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' agent = server ']'"
Oct 01 06:40:40 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:40Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + systemctl enable rke2-agent"
Oct 01 06:40:40 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:40Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' '' = true ']'"
Oct 01 06:40:40 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:40Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' false = true ']'"
Oct 01 06:40:40 harvester-07 rancher-system-agent[11989]: time="2023-10-01T06:40:40Z" level=info msg="[Applyinator] Command sh [-c run.sh] finished with err: <nil> and exit code: 0"

great-bear-19718

10/03/2023, 12:05 AM

that is not today's date?

quaint-alarm-7893

10/03/2023, 12:05 AM

you are not wrong...

quaint-alarm-7893

10/03/2023, 12:06 AM

Copy code

rancher@harvester-07:~> sudo timedatectl
               Local time: Tue 2023-10-03 00:06:13 UTC
           Universal time: Tue 2023-10-03 00:06:13 UTC
                 RTC time: Tue 2023-10-03 00:06:13
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

great-bear-19718

10/03/2023, 12:06 AM

it is the 3rd where i am.. so i figured it should not be the 1st October anywhere now 🤣

quaint-alarm-7893

10/03/2023, 12:06 AM

true story. lol

great-bear-19718

10/03/2023, 12:06 AM

can you restart that?

systemctl restart rancher-system-agent

great-bear-19718

10/03/2023, 12:07 AM

might need to check the logs once this is done

quaint-alarm-7893

10/03/2023, 12:08 AM

Copy code

-- Logs begin at Thu 2023-03-16 05:38:44 UTC. --
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + echo EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' -n ffb03c631d25480057e7bdad200aaf8835233029b9b271d0490e49198dd0b2aa ']'"
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + echo ffb03c631d25480057e7bdad200aaf8835233029b9b271d0490e49198dd0b2aa"
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + systemctl daemon-reload"
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' '' = true ']'"
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' agent = server ']'"
Oct 03 00:07:15 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:15Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + systemctl enable rke2-agent"
Oct 03 00:07:16 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:16Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' '' = true ']'"
Oct 03 00:07:16 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:16Z" level=info msg="[02acbe9892ef3951f3ce1a95cecc909d312aae52712baaca9b97c35c2099bfdf_0:stderr]: + '[' false = true ']'"
Oct 03 00:07:16 harvester-07 rancher-system-agent[16702]: time="2023-10-03T00:07:16Z" level=info msg="[Applyinator] Command sh [-c run.sh] finished with err: <nil> and exit code: 0"

great-bear-19718

10/03/2023, 12:10 AM

any chance i could please see this..

Copy code

kubectl get cluster.provisioning -n fleet-local -o yaml

quaint-alarm-7893

10/03/2023, 12:11 AM

Copy code

apiVersion: v1
items:
- apiVersion: <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>
  kind: Cluster
  metadata:
    annotations:
      <http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
        {"apiVersion":"<http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>","kind":"Cluster","metadata":{"annotations":{},"labels":{"<http://rke.cattle.io/init-node-machine-id|rke.cattle.io/init-node-machine-id>":"42hqkq5728cv59wl99hmwjglvq6hv4pnw4ps9x2d6nfchstmtb82jp"},"name":"local","namespace":"fleet-local"},"spec":{"kubernetesVersion":"v1.22.12+rke2r1","rkeConfig":{"controlPlaneConfig":null}}}
      <http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/4yQzU7DMBCEXwXt2Slt079Y4oAQ4sCVF9jYS2Ow15G9CYfK746SVqJC4udo78xovjlBIEGLgqBPgMxRUFzkPD1j+0ZGMskiubgwKOJp4eKts6ChT3F02UV2fKyMH7JQqkwiFAL1ozV+MKXqOL6DhoCMRwrEciUYa3Xz7NjePZwj/8xiDAQafDTo/yXOPZrJAUXB3NdFfnGBsmDoQfPgvQKPLflfR+gwd6Bhu9ztt3XdUGNwc7Crdr9u6jW1y/pg91vb2LXdbHarA6jzYpbSVwho6DCNNIMWBd9Yrtu+eiKpzpeiIPdkpnbzx2Wq+0G6R7Z9dCygT2WSCcpwwciURrJPxJRmZtDLUj4DAAD//5CVWGcAAgAA
      <http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: provisioning-cluster-create
      <http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=Cluster
      <http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: local
      <http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: ""
    creationTimestamp: "2022-09-22T17:45:55Z"
    finalizers:
    - <http://wrangler.cattle.io/provisioning-cluster-remove|wrangler.cattle.io/provisioning-cluster-remove>
    - <http://wrangler.cattle.io/rke-cluster-remove|wrangler.cattle.io/rke-cluster-remove>
    generation: 4
    labels:
      <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 50675339e9ca48d1b72932eb038d75d9d2d44618
      <http://provider.cattle.io|provider.cattle.io>: harvester
    name: local
    namespace: fleet-local
    resourceVersion: "624758340"
    uid: ee63516e-be79-43b0-a331-59e7e18c264b
  spec:
    kubernetesVersion: v1.24.7+rke2r1
    localClusterAuthEndpoint: {}
    rkeConfig:
      chartValues: null
      machineGlobalConfig: null
      provisionGeneration: 1
      upgradeStrategy:
        controlPlaneDrainOptions:
          timeout: 0
        workerDrainOptions:
          timeout: 0
  status:
    clientSecretName: local-kubeconfig
    clusterName: local
    conditions:
    - status: "True"
      type: Ready
    - lastUpdateTime: "2022-09-22T17:45:55Z"
      status: "False"
      type: Reconciling
    - lastUpdateTime: "2022-09-22T17:45:55Z"
      status: "False"
      type: Stalled
    - lastUpdateTime: "2023-02-24T21:16:06Z"
      status: "True"
      type: Created
    - lastUpdateTime: "2023-09-30T09:06:07Z"
      status: "True"
      type: RKECluster
    - status: Unknown
      type: DefaultProjectCreated
    - status: Unknown
      type: SystemProjectCreated
    - lastUpdateTime: "2022-12-24T02:22:27Z"
      status: "True"
      type: Provisioned
    - lastUpdateTime: "2023-09-30T09:06:07Z"
      message: 'configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes:
        kube-controller-manager, kube-scheduler'
      reason: Waiting
      status: Unknown
      type: Updated
    - lastUpdateTime: "2022-12-24T03:43:59Z"
      status: "True"
      type: Connected
    observedGeneration: 4
    ready: true
kind: List
metadata:
  resourceVersion: ""

great-bear-19718

10/03/2023, 12:12 AM

yeah its still waiting on the node probes to report back

great-bear-19718

10/03/2023, 12:13 AM

any chance i could have a new support-bundle?

great-bear-19718

10/03/2023, 12:21 AM

or tail the logs of rancher pods in

cattle-system

namespace.. based on the scripts run.. everything has been done to get rancher to trigger the reconcile of this node

quaint-alarm-7893

10/03/2023, 12:25 AM

sure, one sec.

quaint-alarm-7893

10/03/2023, 12:25 AM

sorry, ac techs from the move needed something.

great-bear-19718

10/03/2023, 12:26 AM

no rush

great-bear-19718

10/03/2023, 12:50 AM

you are missing a dns record..

Copy code

dial tcp: lookup <http://rancher.mgt.natimark.com|rancher.mgt.natimark.com> on 10.53.0.10:53: no such host

quaint-alarm-7893

10/03/2023, 12:53 AM

thats the rancher manager. its off now is that why?

great-bear-19718

10/03/2023, 12:53 AM

the cluster agent and rancher pods are being spammed by it

quaint-alarm-7893

10/03/2023, 12:54 AM

i can bring it back online. give me a few

quaint-alarm-7893

10/03/2023, 1:03 AM

okay, rancherd is back online

quaint-alarm-7893

10/03/2023, 1:03 AM

should i restart that service again? or you want something else?

great-bear-19718

10/03/2023, 1:03 AM

can you please check the logs for rancher now?

great-bear-19718

10/03/2023, 1:04 AM

kubectl logs -f rancher-bbf4bdf96-6gl7w -n cattle-system

great-bear-19718

10/03/2023, 1:04 AM

this should be running the promotion..

quaint-alarm-7893

10/03/2023, 1:05 AM

Copy code

2023/10/03 01:02:13 [ERROR] Failed to dial steve aggregation server: dial tcp: lookup rancher.mgt.natimark.com on 10.53.0.10:53: no such host
2023/10/03 01:02:18 [ERROR] Failed to dial steve aggregation server: dial tcp: lookup rancher.mgt.natimark.com on 10.53.0.10:53: no such host
2023/10/03 01:02:23 [ERROR] Failed to dial steve aggregation server: websocket: bad handshake
2023/10/03 01:02:28 [ERROR] Failed to dial steve aggregation server: dial tcp: lookup rancher.mgt.natimark.com on 10.53.0.10:53: no such host
2023/10/03 01:02:33 [ERROR] Failed to dial steve aggregation server: dial tcp: lookup rancher.mgt.natimark.com on 10.53.0.10:53: no such host
2023/10/03 01:04:35 [INFO] Downloading repo index from <https://releases.rancher.com/server-charts/stable/index.yaml>
2023/10/03 01:05:18 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler

quaint-alarm-7893

10/03/2023, 1:05 AM

looks like its moving now

quaint-alarm-7893

10/03/2023, 1:07 AM

Copy code

2023/10/03 01:04:35 [INFO] Downloading repo index from <https://releases.rancher.com/server-charts/stable/index.yaml>
2023/10/03 01:05:18 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler
2023/10/03 01:05:32 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2023/10/03 01:05:51 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler
2023/10/03 01:06:34 [ERROR] Error during subscribe websocket: close sent

great-bear-19718

10/03/2023, 1:07 AM

are you able to check on

harvester-07

node if anything got triggered?

quaint-alarm-7893

10/03/2023, 1:08 AM

still showing promoting, and same error on the journalctl -fu rancher-system-agent

great-bear-19718

10/03/2023, 1:09 AM

k get machines.cluster custom-eb75b22fefcf -n fleet-local -o yaml

quaint-alarm-7893

10/03/2023, 1:09 AM

Copy code

apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
kind: Machine
metadata:
  annotations:
    <http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/5yTwW7bPBCEX+XHniXHlixREvBfWvQUtAe36H21XNqsKdIgV06LwO9eSHHcuEHSNkeJnMU3M8t7GFhQoyB094DeB0GxwafpM/TfmCSxLKINC0IRxwsbbqyGDkY/oMct63xA2lnPkL0oCHeeY7497qGDm+Mq++/Wev3/Z6bI8keZx4GhAxqThCHnXlV9URg2ZP5Kmg5Ik94FQgenDCjybPGLHTgJDgfo/OhcBg57drNxcmMSjovv+b5J07jzj0eWh1kZ7DAeeTrYkZ2unROBDiSOrwWyw7SDDpalMqQ1NTWu123F67pmXlFd9kvTlqYxpaqU5goyiHt+on+B5/rSuZd8rksva1UqomrVUFG0Rq2JG1Sq4LZet31ZVKopmrLqm74tWa206TUutTJFRbUyz4bfhbjnmMfg+NHuKYNXu3rahnHMkl86SQemKfg+BEkS8TC3ELyx2w2beTUP9ivHZIOH7jeU4woy2Fs/2dzcfnh3mfFWnmlJHgL+dJ2v9SZikjiSjJH/jez9zPDx8lbejJYEZUxXaW0Y9Q/oDLrEzyl/nZ1OPwMAAP//RguWUfADAAA
    <http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: unmanaged-machine
    <http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: /v1, Kind=Secret
    <http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: custom-eb75b22fefcf
    <http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: local
  creationTimestamp: "2023-03-16T05:39:03Z"
  finalizers:
  - <http://machine.cluster.x-k8s.io|machine.cluster.x-k8s.io>
  generation: 3
  labels:
    <http://cluster.x-k8s.io/cluster-name|cluster.x-k8s.io/cluster-name>: local
    <http://harvesterhci.io/managed|harvesterhci.io/managed>: "true"
    <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 037fcddc86a4495e466ee1c63b0f93f8f3757de5
    <http://rke.cattle.io/cluster-name|rke.cattle.io/cluster-name>: local
    <http://rke.cattle.io/control-plane-role|rke.cattle.io/control-plane-role>: "true"
    <http://rke.cattle.io/etcd-role|rke.cattle.io/etcd-role>: "true"
    <http://rke.cattle.io/machine-id|rke.cattle.io/machine-id>: d06737cc518c229f74ce8a772e9649b325782835b8b93e71dfbda0d7f25c67f
    <http://rke.cattle.io/worker-role|rke.cattle.io/worker-role>: "true"
  name: custom-eb75b22fefcf
  namespace: fleet-local
  ownerReferences:
  - apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
    kind: Cluster
    name: local
    uid: 736f82bc-3db3-4252-84f2-4ef8a4c46846
  resourceVersion: "629257036"
  uid: 41236e43-79aa-40fd-8cc7-03122e34ff42
spec:
  bootstrap:
    configRef:
      apiVersion: <http://rke.cattle.io/v1|rke.cattle.io/v1>
      kind: RKEBootstrap
      name: custom-eb75b22fefcf
      namespace: fleet-local
    dataSecretName: custom-eb75b22fefcf-machine-bootstrap
  clusterName: local
  infrastructureRef:
    apiVersion: <http://rke.cattle.io/v1|rke.cattle.io/v1>
    kind: CustomMachine
    name: custom-eb75b22fefcf
    namespace: fleet-local
  providerID: <rke2://harvester-07>
status:
  addresses:
  - address: 192.168.5.170
    type: InternalIP
  - address: harvester-07
    type: Hostname
  bootstrapReady: true
  conditions:
  - lastTransitionTime: "2023-03-16T05:39:04Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-03-16T05:39:04Z"
    status: "True"
    type: BootstrapReady
  - lastTransitionTime: "2023-03-16T05:39:03Z"
    status: "True"
    type: InfrastructureReady
  - lastTransitionTime: "2023-10-01T06:40:43Z"
    status: "True"
    type: NodeHealthy
  - lastTransitionTime: "2023-03-16T05:39:03Z"
    status: "True"
    type: PlanApplied
  - lastTransitionTime: "2023-06-28T06:55:43Z"
    status: "True"
    type: Reconciled
  infrastructureReady: true
  lastUpdated: "2023-03-16T05:39:27Z"
  nodeInfo:
    architecture: amd64
    bootID: e120d133-b9c9-45ad-bb97-d1a42a3b261d
    containerRuntimeVersion: <containerd://1.6.8-k3s1>
    kernelVersion: 5.3.18-150300.59.101-default
    kubeProxyVersion: v1.24.7+rke2r1
    kubeletVersion: v1.24.7+rke2r1
    machineID: f9b21763a25ec86d013eafc56412ab6b
    operatingSystem: linux
    osImage: Harvester v1.1.1
    systemUUID: 00000000-0000-0000-0000-309c23e612c0
  nodeRef:
    apiVersion: v1
    kind: Node
    name: harvester-07
    uid: 80bc540c-30c0-46a5-b863-953187a2e314
  observedGeneration: 3
  phase: Running

great-bear-19718

10/03/2023, 1:10 AM

can you please check this on harvester-07? https://github.com/harvester/harvester/issues/3863#issuecomment-1539681311

quaint-alarm-7893

10/03/2023, 1:13 AM

what node should i do this on?

Copy code

echo "Rotating kube-controller-manager certificate"

quaint-alarm-7893

10/03/2023, 1:13 AM

does it have to be a master? or on har-7?

great-bear-19718

10/03/2023, 1:14 AM

harv-7

great-bear-19718

10/03/2023, 1:14 AM

i dont think this will be the case

quaint-alarm-7893

10/03/2023, 1:15 AM

so i dont hace the tls folder for:

Copy code

sudo rm /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.{crt,key}

should i keep going?

quaint-alarm-7893

10/03/2023, 1:16 AM

i do on a master, but not on har-7

quaint-alarm-7893

10/03/2023, 1:16 AM

and i've had this cluster for 375 days... so maybe

great-bear-19718

10/03/2023, 1:17 AM

well issue would have happened on harv-7 i'd think

great-bear-19718

10/03/2023, 1:17 AM

crictl ps

are you able to please check this on harv-7

quaint-alarm-7893

10/03/2023, 1:18 AM

Copy code

CONTAINER           IMAGE               CREATED             STATE               NAME                            ATTEMPT             POD ID              POD
d3f7b46617e71       219ee5171f800       2 hours ago         Running             promote                         0                   ec8543dca05e9       harvester-promote-harvester-07-qqnlp
529c10d7636d7       9b80adc8eaa31       41 hours ago        Running             compute                         0                   57bca8824ce13       virt-launcher-addrbbox-dhnbf
5accad7b5047d       9b80adc8eaa31       42 hours ago        Running             compute                         0                   ce0d629e94526       virt-launcher-accuzip-rvwnd
cd88f91d3224f       1a7095f7e9bc9       43 hours ago        Running             backing-image-manager           0                   faabb0153f36c       backing-image-manager-d7ad-8fd6
881d47a92dcba       de93f80351954       43 hours ago        Running             longhorn-csi-plugin             27                  018a77a10bf01       longhorn-csi-plugin-qxl9n
703120bf37f67       77c44d54b1211       43 hours ago        Running             virt-handler                    210                 3bd6db0b6bc2e       virt-handler-v86pv
72d8301278723       36b11648e019a       43 hours ago        Running             replica-manager                 0                   4290477270099       instance-manager-r-17fd168b
485914591d6f5       36b11648e019a       43 hours ago        Running             engine-manager                  0                   0796d85ade51a       instance-manager-e-ce98fd9a
bd92b9dd18961       28003e667aa9f       43 hours ago        Running             rke2-ingress-nginx-controller   22                  245e088876dda       rke2-ingress-nginx-controller-hht5t
bbcdda3ad8019       8318b9b61b32b       43 hours ago        Running             rancher                         4                   1c5a89a77e057       rancher-bbf4bdf96-5qqzz
5ff221b27fbf3       de93f80351954       43 hours ago        Running             longhorn-manager                22                  5c4df84d9f4f3       longhorn-manager-czr8c
366d62b2a4bee       cb03930a2bd42       43 hours ago        Running             node-driver-registrar           17                  018a77a10bf01       longhorn-csi-plugin-qxl9n
55a83adb228af       5131c4e1af289       43 hours ago        Running             fluent-bit                      17                  d56fb75e55ee6       rancher-logging-kube-audit-fluentbit-8xsvn
51a4170c2db93       5131c4e1af289       43 hours ago        Running             fluent-bit                      17                  3e2411765caea       rancher-logging-root-fluentbit-stzfj
f316b187484ae       8681890ac02c0       43 hours ago        Running             engine-image-ei-a5371358        17                  0175ce5479730       engine-image-ei-a5371358-2pj9f
ec0d39076e4b3       5131c4e1af289       43 hours ago        Running             fluentbit                       17                  8c53d37ef207b       rancher-logging-rke2-journald-aggregator-4zfdz
d0e625f5cb9c4       8203b8fd46399       43 hours ago        Running             apiserver                       7                   bd3db7e8593fb       harvester-7794f4b7c4-4b7n4
0754a0c885c59       803347fbe5a24       43 hours ago        Running             harvester-webhook               3                   d632f61ef9586       harvester-webhook-5b88c99f5d-jfxw5
e9b21c061a2ab       39dd4d1e9ee87       43 hours ago        Running             node-manager                    17                  66389f94b96e6       harvester-node-manager-gv9l2
7cba533871094       ab979157630fc       43 hours ago        Running             kube-flannel                    17                  f0f4fd67fa1ec       rke2-canal-bzxwf
061d519275735       5dddbb6d554c6       43 hours ago        Running             calico-node                     17                  f0f4fd67fa1ec       rke2-canal-bzxwf
cf5802af468b3       12f4ea63839f6       43 hours ago        Running             harvester-network               22                  55f3e1c0299de       harvester-network-controller-j8wgg
4b142c0e05db7       347508c544b98       43 hours ago        Running             harvester-node-disk-manager     24                  fdcf00e9d2a02       harvester-node-disk-manager-ttwx9
c467b19be9789       38df782a74380       43 hours ago        Running             kube-proxy                      18                  8251deb85f40c       kube-proxy-harvester-07
2cafc8ed23e17       a49a7ca14bb9d       43 hours ago        Running             whereabouts                     17                  b55955860656a       harvester-whereabouts-rrwkg
2cb89b4f9433b       0482afd7c6409       43 hours ago        Running             longhorn-loop-device-cleaner    17                  0ec3bcc0b8f7a       longhorn-loop-device-cleaner-vrvgc
47393a3a21fbe       9ef244af5338c       43 hours ago        Running             kube-rke2-multus                17                  745584a10ed80       rke2-multus-ds-kfl99
36c1f32678c9e       0fafea1498594       43 hours ago        Running             node-exporter                   25                  3ced0fb0ad0e8       rancher-monitoring-prometheus-node-exporter-9qgjw

great-bear-19718

10/03/2023, 1:34 AM

we could try restarting rancher pods

great-bear-19718

10/03/2023, 1:35 AM

the embedded rancher in harvester else i will need to ask in our rancher team since i am not across the logic its trying to run

quaint-alarm-7893

10/03/2023, 1:36 AM

i can try the delete. any specific command? or type of pod i wanna kill?

great-bear-19718

10/03/2023, 1:42 AM

just the rancher pods..

kubectl delete pod rancher-bbf4bdf96-5qqzz rancher-bbf4bdf96-6gl7w rancher-bbf4bdf96-hkhpf -n cattle-system

great-bear-19718

10/03/2023, 1:42 AM

it should spin up new ones and we will know if anything happens

quaint-alarm-7893

10/03/2023, 1:43 AM

🤞

quaint-alarm-7893

10/03/2023, 1:46 AM

its not spinning up on 7. but it's cordoned because of the promote... should i uncordon?

great-bear-19718

10/03/2023, 1:46 AM

leave it as such

great-bear-19718

10/03/2023, 1:46 AM

there are other pods.. only 1 is actually doing the processing anyways

quaint-alarm-7893

10/03/2023, 1:52 AM

got yah.

Copy code

I1003 01:44:04.652321      33 leaderelection.go:258] successfully acquired lease kube-system/cattle-controllers
2023/10/03 01:44:04 [INFO] Steve auth startup complete
2023/10/03 01:44:05 [INFO] Starting /v1, Kind=Node controller
2023/10/03 01:44:05 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=User controller
2023/10/03 01:44:05 [INFO] Starting /v1, Kind=Namespace controller
2023/10/03 01:44:05 [INFO] Starting /v1, Kind=Pod controller
2023/10/03 01:44:05 [INFO] Starting <http://rke.cattle.io/v1|rke.cattle.io/v1>, Kind=RKECluster controller
2023/10/03 01:44:05 [INFO] Starting apps/v1, Kind=Deployment controller
2023/10/03 01:44:05 [INFO] Starting <http://rke.cattle.io/v1|rke.cattle.io/v1>, Kind=RKEControlPlane controller
2023/10/03 01:44:05 [INFO] Starting <http://admissionregistration.k8s.io/v1|admissionregistration.k8s.io/v1>, Kind=MutatingWebhookConfiguration controller
2023/10/03 01:44:05 [INFO] Starting <http://rke.cattle.io/v1|rke.cattle.io/v1>, Kind=ETCDSnapshot controller
2023/10/03 01:44:05 [INFO] Starting <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>, Kind=Cluster controller
2023/10/03 01:44:05 [INFO] Starting <http://rke.cattle.io/v1|rke.cattle.io/v1>, Kind=RKEBootstrapTemplate controller
2023/10/03 01:44:05 [INFO] Starting <http://catalog.cattle.io/v1|catalog.cattle.io/v1>, Kind=Operation controller
2023/10/03 01:44:05 [INFO] Starting <http://admissionregistration.k8s.io/v1|admissionregistration.k8s.io/v1>, Kind=ValidatingWebhookConfiguration controller
2023/10/03 01:44:05 [INFO] Starting <http://catalog.cattle.io/v1|catalog.cattle.io/v1>, Kind=App controller
2023/10/03 01:44:05 [INFO] Starting <http://fleet.cattle.io/v1alpha1|fleet.cattle.io/v1alpha1>, Kind=Bundle controller
2023/10/03 01:44:05 [INFO] Starting <http://fleet.cattle.io/v1alpha1|fleet.cattle.io/v1alpha1>, Kind=Cluster controller
2023/10/03 01:44:05 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=FleetWorkspace controller
2023/10/03 01:44:05 [INFO] Starting /v1, Kind=Service controller
2023/10/03 01:44:05 [INFO] Starting apps/v1, Kind=DaemonSet controller
2023/10/03 01:44:05 [INFO] Starting <http://rke.cattle.io/v1|rke.cattle.io/v1>, Kind=CustomMachine controller
2023/10/03 01:44:05 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ManagedChart controller
2023/10/03 01:44:05 [INFO] Starting <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>, Kind=MachineDeployment controller
2023/10/03 01:44:08 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler
2023/10/03 01:44:08 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler
2023/10/03 01:45:22 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler
2023/10/03 01:45:55 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-716cb3ba930e: waiting for probes: kube-controller-manager, kube-scheduler
2023/10/03 01:46:10 [ERROR] Error during subscribe websocket: close sent
2023/10/03 01:46:57 [ERROR] Error during subscribe websocket: close sent
2023/10/03 01:48:18 [INFO] Downloading repo index from <https://releases.rancher.com/server-charts/stable/index.yaml>
2023/10/03 01:48:18 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>

great-bear-19718

10/03/2023, 1:53 AM

do you see anything in

rancher-system-agent

logs?

quaint-alarm-7893

10/03/2023, 1:54 AM

nothing.

quaint-alarm-7893

10/03/2023, 1:54 AM

bounce that service?

great-bear-19718

10/03/2023, 1:54 AM

you could try.. but i doubt it will do anything

quaint-alarm-7893

10/03/2023, 1:54 AM

nope. same thing.

great-bear-19718

10/03/2023, 1:55 AM

i will need to ask someone in the rancher team about this

quaint-alarm-7893

10/03/2023, 1:55 AM

is there a way to kill the promo if i try and join another node?

great-bear-19718

10/03/2023, 1:55 AM

not sure.. i think it will fail eventually

great-bear-19718

10/03/2023, 1:55 AM

and then it will try another node if there is one

quaint-alarm-7893

10/03/2023, 1:57 AM

okay. well if you wanna reach out to the rancherd team, i'll try and join another machine again. i'll let you know if it doesnt join like i was observing before.

quaint-alarm-7893

10/03/2023, 2:12 AM

for the grafana issue, it doesnt seem to want to attach the pvc, despite it being health and detached.

quaint-alarm-7893

10/03/2023, 2:47 AM

any reason why i shouldnt shutdown?

quaint-alarm-7893

10/04/2023, 3:38 PM

@great-bear-19718 anything on this stuck promotion?

great-bear-19718

10/04/2023, 10:43 PM

i have not heard back yet.. i will chase up again

quaint-alarm-7893

10/05/2023, 3:00 PM

@great-bear-19718 no worries. thanks. also i have another node up, still not registering either. for that server, here's the long from rancherd:

Copy code

-- Logs begin at Wed 2023-10-04 21:22:22 UTC. --
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stdout]: [INFO]  Successfully downloaded Rancher connection information"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stdout]: [INFO]  systemd: Creating service file"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stdout]: [INFO]  Creating environment file /etc/systemd/system/rancher-system-agent.env"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stdout]: [INFO]  Enabling rancher-system-agent.service"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stderr]: Created symlink /etc/systemd/system/multi-user.target.wants/rancher-system-agent.service → /etc/systemd/system/rancher-system-agent.service."
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stdout]: [INFO]  Starting/restarting rancher-system-agent.service"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="No image provided, creating empty working directory /var/lib/rancher/rancherd/plan/work/20231004-212252-applied.plan/_1"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="Running command: /usr/bin/rancherd [probe]"
Oct 04 21:22:54 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:54Z" level=info msg="[stderr]: time=\"2023-10-04T21:22:54Z\" level=info msg=\"Running probes defined in /var/lib/rancher/rancherd/plan/plan.json\""
Oct 04 21:22:55 harvester-02-r2 rancherd[2368]: time="2023-10-04T21:22:55Z" level=info msg="[stderr]: time=\"2023-10-04T21:22:55Z\" level=info msg=\"Probe [kubelet] is unhealthy\""

quaint-alarm-7893

10/05/2023, 3:02 PM

here's the status for that machine too:

Copy code

Status:
  Bootstrap Ready:  true
  Conditions:
    Last Transition Time:  2023-10-04T21:22:55Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2023-10-04T21:22:55Z
    Status:                True
    Type:                  BootstrapReady
    Last Transition Time:  2023-10-04T21:22:54Z
    Status:                True
    Type:                  InfrastructureReady
    Last Transition Time:  2023-10-04T21:22:54Z
    Reason:                WaitingForNodeRef
    Severity:              Info
    Status:                False
    Type:                  NodeHealthy
  Last Updated:            2023-10-04T21:22:55Z
  Observed Generation:     2
  Phase:                   Provisioning

great-bear-19718

10/05/2023, 11:47 PM

i assume

rancher-system-agent

is running on both nodes?

quaint-alarm-7893

10/06/2023, 2:31 AM

so two things 1) i removed the rancher management server from harvester and cluster incase it was part of the issue. 2) no, i do not see anything for rancher-system-agent running

great-bear-19718

10/06/2023, 4:01 AM

i dont think running rancher is the issue..

quaint-alarm-7893

10/06/2023, 1:47 PM

@great-bear-19718 i wasnt sure it was either, just wanted to take it out of the equation just incase. posted a bundle i pulled last night

quaint-alarm-7893

10/09/2023, 1:01 AM

@great-bear-19718 any ideas on why i cant join a node or get the 3rd node to promote?

great-bear-19718

10/09/2023, 1:41 AM

i dont have an answer yet.. do you see anything else in

rancher-system-agent

logs?

great-bear-19718

10/09/2023, 1:42 AM

also are you able to zip up this folder?

/var/lib/rancher/rancherd/plan/

on the node which is waiting to be bootstrapped?>

great-bear-19718

10/09/2023, 1:42 AM

might be best to DM me since it may have some sensitive info about the cluster

quaint-alarm-7893

10/09/2023, 2:26 AM

for rancher-system-agent. is that a service on the machine? and if yes, which machine? the node i'm trying to join? or a different one?

quaint-alarm-7893

10/09/2023, 2:26 AM

i can send you that folder tomorrow, the server is currently offline

quaint-alarm-7893

10/09/2023, 2:29 AM

@great-bear-19718 i see it's a service, which node do you want it off of?

great-bear-19718

10/09/2023, 3:21 AM

the one which is not joining

great-bear-19718

10/09/2023, 3:21 AM

there should be a plan folder which should have some info which might help me figure out what is going on

quaint-alarm-7893

10/09/2023, 3:21 AM

okay, i can check it tomorrow then. it's offline right 😞

great-bear-19718

10/09/2023, 3:21 AM

👍

4 Views

Open in Slack

Previous Next