This message was deleted.
# general
a
This message was deleted.
h
So was it inplace upgrade for OS from CentOS7.8 to Rocky 8.7?? Did you guys take cluster backup before the upgrade? 1) What kubernetes distribution is Rancher running on?
kubectl get nodes
2) I would have created a new cluster on newer OS and used Rancher backup operator https://ranchermanager.docs.rancher.com/reference-guides/backup-restore-configuration/examples#backup 3) Follow the upgrade documentation: https://rke.docs.rancher.com/upgrades
r
oww, I would have just created new nodes and phased out the old ones completely instead of upgrading the clusters. Less downtime, easier to go back if something fails along the line and less baggage from the old OS. EDIT: essentially what dc.901 said 🙂
do you still have a non-LDAP/AD login for the rancher web ui?
d
@hundreds-evening-84071, yes the upgrade was in place. I did a backup from the UI before I first attempted to upgrade Rancher. And, of course, there’s a nightly backup running for the whole machine. 1. the kubectl command did not return (ServerTimeout) 2. We will create a new cluster for sure but for now my task is to make that cluster available again 😞 @red-waitress-37932 I have credentials for local admin but as said they don't work either. And in the logs of the nginX containers I see messages that indicate a timeout
upstream timed out (110: Operation timed out) while connecting to upstream
. So, my guess would be that there are one or more containers in either an error state or that not all containers that are needed up and running. Is there a way to see what containers SHOULD be running?
BTW, I just saw that none of the running containers has an IP address 🤔
h
I am bit confused... So, please help me understand You are not able to login to Rancher UI? kubectl command returns ServerTimeout So, how can you see some containers are running and some are not and they do or do not have IPs?
d
Yes, the UI is returning 504 and the kubectl command returns ServerTimeout. On the command line with
docker ps
I see the running containers (but I have no idea if that are all containers that should run or if there a more containers needed to be up and running). And with `docker inspect <CONTAINER_ID>`I can see, that there’s no IP address set in the output.
h
oh I see - you are looking at docker.. so that means its RKE cluster... One thing you can do is spin up another RKE cluster deploy rancher there and compare the 2 setup sorry do not have any thing better to tell you at this point
d
a friend of mine just gave me the hint that due to the OS upgrade some certificates might have changed and therefore I might have the issues. Could that be a possible cause?
h
it is possible... although I do not know what would be the easiest path to find (and resolve) the issue? personally I would spin up a fresh environment and compare which docker containers are up and which are not. Then look at
docker logs <container-name>
that may hopefully shed some light?
d
thanks, will give it a try
r
Something I'd try is to go through the pre-req setup instructions for Rancher & downstream cluster to make sure that you still have firewalld turned off, still have the Network Manager exceptions, SELinux is still as you expect, still have kernel settings (sysctl) as expected, & all that jazz.
If none of that works I believe there's an #rke channel and they might know more? I've used RKE2 & K3S (through K3D mostly), but not RKE, so I don't know much about it.
d
I finally fired up a 2nd Rancher cluster using version 2.7.x with 3 nodes following the documentation at the Rancher web page and the UI is working there. Cluster looks good and I can work with it. Of course, the authentication stuff still needs to be done, so that users can use their AD credentials. But I am still facing an issue with the imported K8S cluster that is connected to the old Rancher installation. Is it possible to move it to the new installation somehow? I went into Rancher UI and there I tried to add the other cluster. After hitting “Create” I got a few
kubectl
commands and I tried to execute the first one on the existing K8S cluster but it only gave me the message that I need to be logged in. So, my guess would be that /root/.kube/config needs to be adjusted because this file contains certificate and key which are probably belong to the old cluster. Can I simply replace these values with the ones from the new cluster and then the cluster will be imported? If yes, where do I find them in Rancher?