This message was deleted Rancher Users #kubernetes

Join Slack

This message was deleted.

# kubernetes

adamant-kite-43734

11/09/2023, 9:05 PM

This message was deleted.

hundreds-evening-84071

11/09/2023, 9:07 PM

on the cluster that is not connected, what is output of:

kubectl get po -n cattle-system

hundreds-evening-84071

11/09/2023, 9:07 PM

then look at the logs of those pods

hundreds-evening-84071

11/09/2023, 9:07 PM

kubectl logs -n cattle-system <pod name>

adventurous-address-26812

11/09/2023, 9:10 PM

Thanks. So part of my problem right now is that I can't authenticate/query those nodes or cluster. It just hangs trying to reach the rancher URL in the config file or the direct IP for the CP node. Is there a way to access this cluster in this state that maybe I don't know of?

adventurous-address-26812

11/09/2023, 9:26 PM

it just hangs in an output - almost as if the token in the downloaded kubeconfig file is no longer authenticating or able to authenticate due to the status

adventurous-address-26812

11/10/2023, 6:27 PM

hi @hundreds-evening-84071 - a bit more to update. This is the exact issue I am hitting: https://github.com/rancher/rancher/issues/41292 I don't know if/how to upgrade via rke cli manually - is that something you have some experience in perhaps? I cross posted to the vsphere channel as well in case there is anyone there with experience - no hits thus far. thx for any help!

hundreds-evening-84071

11/10/2023, 6:39 PM

I do not have any downstream RKE clusters... so I do not have that experience (of upgrading RKE from Rahcher UI). My Rancher cluster however is RKE. And this is the method I have used (over last 4-years) to upgrade RKE via CLI... https://rke.docs.rancher.com/upgrades#listing-supported-kubernetes-versions

hundreds-evening-84071

11/10/2023, 6:40 PM

Run:

rke config --list-version --all

Then update

rancher-cluster.yml

Add

kubernetes_version: <version from --list-version above>

run

rke up --config ./rancher-cluster.yml

hundreds-evening-84071

11/10/2023, 6:41 PM

I download latest rke binary from here: https://github.com/rancher/rke/releases/

adventurous-address-26812

11/10/2023, 6:41 PM

thank you. That is the page I am landing on at the moment and I downloaded rke clic 1.3.24 since it lists both my existing unsupported version and the one I want to get to (1.23) to be supported with Rancher 2.7.5. For the rancher-cluster.yml file - is that located on each node or is that something you built? I am not finding it so that is where I am stuck at the moment

hundreds-evening-84071

11/10/2023, 6:42 PM

that file I built when I deployed RKE cluster...

hundreds-evening-84071

11/10/2023, 6:43 PM

this is what I have in it:

Copy code

nodes:
  - address: rancher1
    user: rancher
    role: [controlplane,worker,etcd]
  - address: rancher2
    user: rancher
    role: [controlplane,worker,etcd]
  - address: rancher3
    user: rancher
    role: [controlplane,worker,etcd]
 
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h
 
kubernetes_version: "v1.15.9-rancher1-2"

adventurous-address-26812

11/10/2023, 6:43 PM

ok, I build through the UI initially so I am guessing that is why I don't have it anywhere. Is it OK to assume I can take the exmaple above or on their site and use that likely?

adventurous-address-26812

11/10/2023, 6:43 PM

saw an example on their site earlier

hundreds-evening-84071

11/10/2023, 6:43 PM

so I have 3-node RKE cluster in HA that runs Rancher... ignore the kubernetes version - that was from when I wrote my documentation many years ago

hundreds-evening-84071

11/10/2023, 6:44 PM

possibly - but again I have not done what you are doing so just guessing

adventurous-address-26812

11/10/2023, 6:44 PM

ok...I am in a bit pof a pickle so I wil try anything

hundreds-evening-84071

11/10/2023, 6:45 PM

good luck

adventurous-address-26812

11/10/2023, 6:45 PM

you have confirmed some of what I was looking for. I think I will try that process above and hopefully that PSP error won't bite me. like even the UI seems to think it can upfdate, but it complains about the PSP in that github forum

hundreds-evening-84071

11/10/2023, 6:46 PM

atleast create the file and see if backup works:

rke etcd snapshot-save --name <filename> --config /root/rancher-cluster.yml

adventurous-address-26812

11/10/2023, 6:46 PM

ok, thakns, I will try that - i will let you know how I make out. I appreciate you getting back to me - this is helpful. I'll try not to be a pest and I'll dig into this and try a few things now. thank you!

hundreds-evening-84071

11/10/2023, 6:47 PM

you are welcome - good luck and happy friday

adventurous-address-26812

11/10/2023, 6:47 PM

thanks! right back ya - let's hope it is a happy friday...haha

🤞 1

adventurous-address-26812

11/13/2023, 4:23 PM

Hi @hundreds-evening-84071 - thanks for your help last week. We have gotten further with doing the "rke up" command. The backup command you gave me now works, however when we try the actual upgrade, we are getting a TLS cert error. Initially we had to update/copy the SSH keys for the docker user to all the nodes. I was curious if you might have seen this before or have any thoughts on where to fix it. I know you haven't done this upgrade with a cluster that was created in the UI, but figured you might have some thoughts that I haven't found/looked at yet:

Copy code

INFO[0034] [etcd] Successfully started [rke-log-linker] container on host [10.227.227.71]
INFO[0034] Removing container [rke-log-linker] on host [10.227.227.71], try #1
INFO[0034] [remove/rke-log-linker] Successfully removed container on host [10.227.227.71]
INFO[0034] Image [rancher/rke-tools:v0.1.88] exists on host [10.227.227.71]
INFO[0035] Starting container [rke-log-linker] on host [10.227.227.71], try #1
INFO[0035] [etcd] Successfully started [rke-log-linker] container on host [10.227.227.71]
INFO[0035] Removing container [rke-log-linker] on host [10.227.227.71], try #1
INFO[0036] [remove/rke-log-linker] Successfully removed container on host [10.227.227.71]
INFO[0036] [etcd] Successfully started etcd plane.. Checking etcd cluster health
WARN[0139] [etcd] host [10.227.227.71] failed to check etcd health: failed to get /health for host [10.227.227.71]: Get "<https://10.227.227.71:2379/health>": remote error: tls: bad certificate
FATA[0139] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.227.227.71] failed to report healthy. Check etcd container logs on each host for more information

hundreds-evening-84071

11/13/2023, 4:54 PM

have you tried this? https://stackoverflow.com/questions/71603038/rancher-rke-up-errors-on-etcd-host-health-checks-remote-error-tls-bad-certific

adventurous-address-26812

11/13/2023, 5:00 PM

Hi, I didn't run across that one yet. Have tried a few others....thank you for this...let me review this and run through it and see what it does. I'll let you know. thank you

Open in Slack

Previous Next