sooo i went to upgrade my kubernetes version today and i thi Rancher Users #rke2

Join Slack

sooo i went to upgrade my kubernetes version today...

# rke2

ancient-monitor-6002

04/09/2025, 3:39 PM

sooo i went to upgrade my kubernetes version today, and i think it just fubared my entire cluster.... any suggestions?

bland-article-62755

04/09/2025, 3:42 PM

Do you have etcd backups?

ancient-monitor-6002

04/09/2025, 3:42 PM

i thought i did but they appear to be completely gone now

bland-article-62755

04/09/2025, 3:43 PM

Cause I'm pretty sure you're going to need to restore a backup from at least one of the nodes. To simplify it, I'd probably scale down to 1 (since they're just VMs) get a backup - restore it, then scale up to three again.

bland-article-62755

04/09/2025, 3:43 PM

You weren't backing up to S3 or nfs or something?

ancient-monitor-6002

04/09/2025, 3:43 PM

it looks like it deleted all the old nodes, and no i was not, thats my screw up

bland-article-62755

04/09/2025, 3:43 PM

Do you have vmware backups of your old nodes?

ancient-monitor-6002

04/09/2025, 3:44 PM

No 😕

bland-article-62755

04/09/2025, 3:44 PM

Well.

bland-article-62755

04/09/2025, 3:44 PM

I don't see much else you can do.

ancient-monitor-6002

04/09/2025, 3:44 PM

we do have snapshots from the nimble of the entire datastore

ancient-monitor-6002

04/09/2025, 3:45 PM

but thats it

bland-article-62755

04/09/2025, 3:45 PM

Unless the snapshots have the etcd backups, or disks from the VMs that use to be your nodes, then you have nothing to restore.

bland-article-62755

04/09/2025, 3:46 PM

at least you identified some issues with your backup and DR strategies.

ancient-monitor-6002

04/09/2025, 3:47 PM

sounds like im making a new cluster then

ancient-monitor-6002

04/09/2025, 3:47 PM

that sucks

bland-article-62755

04/09/2025, 3:47 PM

Sounds like it. If you have paid support through Suse, it might be worth reaching out to them.

ancient-monitor-6002

04/09/2025, 3:48 PM

we dont sadly

bland-article-62755

04/09/2025, 3:50 PM

I'm sorry that sucks. This is the kind of event where that support is worth it's weight in gold.

ancient-monitor-6002

04/09/2025, 3:57 PM

deploying a new cluster now, definitely going to start backing up etcd externally.... hard lesson learned, luckily i have a database dump from last week of our mysql db

ancient-monitor-6002

04/09/2025, 4:25 PM

almost got everything back up.....

ancient-monitor-6002

04/09/2025, 4:37 PM

and we are back

ancient-monitor-6002

04/09/2025, 4:38 PM

definitely setting up s3 backup for etcd now

ancient-monitor-6002

04/09/2025, 4:38 PM

luckily i have a awesome ci/cd workflow which made redeploying slick

bland-article-62755

04/09/2025, 5:05 PM

And you tested a worse case senario!

ancient-monitor-6002

04/09/2025, 5:18 PM

and were back! now to identify a s3 compatible location to back up etcd......

ancient-monitor-6002

04/09/2025, 5:51 PM

dumb question we just deployed a s3 compatible resource in our environment for endpoint if its not going to a dns name can we just put the ip and port or do we also need to do the protocol

bland-article-62755

04/09/2025, 5:55 PM

let me look at how we have ours set up

bland-article-62755

04/09/2025, 5:56 PM

We do have a dns name to our external ceph cluster, but from what it looks like it can just be

<ip>:<port>

ie:

10.10.1.150:8888

bland-article-62755

04/09/2025, 5:57 PM

there's no

s3://

or other stuff in my configs

bland-article-62755

04/09/2025, 5:57 PM

region is empty

ancient-monitor-6002

04/09/2025, 6:11 PM

awesome! so i just plopped in 192.168.70.66:9000 and deployed a quick test cluster to do some testing and we shall see, i just deployed a HA minio system with a couple ubuntu servers not perfect, but itll do(this is all non critical data also)

bland-article-62755

04/09/2025, 6:13 PM

You could potentially do a local backup and have some sort of cron script to go grab the files and stash them somewhere. We did that for a while too. Much more hacky but if it works it works.

bland-article-62755

04/09/2025, 6:13 PM

The bucket is probably better. 🙂

ancient-monitor-6002

04/09/2025, 6:14 PM

yea definitely going for s3

ancient-monitor-6002

04/09/2025, 6:14 PM

it seems it did not make a snapshot... time to debug a bit

ancient-monitor-6002

04/09/2025, 6:23 PM

okay think i solved it endpoint=http://192.168.70.66:9000 to communicate with s3 this is my load balancer ip for minio outside of kubernetes, seems to work when using awscli and boto3 in python, so now to test in rancher on my dev cluster

ancient-monitor-6002

04/09/2025, 6:24 PM

once satisfied i guess i can wrap it into s3.domain.com using nginx also to map to ip:port and save the headaches and add ssl

ancient-monitor-6002

04/09/2025, 7:11 PM

weird issue i can write files to s3 with python all day long but etcd backups never hit

ancient-monitor-6002

04/09/2025, 7:33 PM

Copy code

root@test-pool1-n2z98-4n6hn:/etc/rancher/rke2# rke2 etcd-snapshot save --config /dev/null --s3 --s3-endpoint 192.168.70.66:9000 --s3-skip-ssl-verify --s3-insecure --s3-bucket rke2-snapshots
INFO[0033] Snapshot on-demand-test-pool1-n2z98-4n6hn-1744227134 saved. 
INFO[0033] Snapshot on-demand-test-pool1-n2z98-4n6hn-1744227134 saved.

works when i test on a public bucket will add auth once i ge tthis workin. but when i use the gui it never seems to work

5 Views

Open in Slack

Previous Next