This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

05/03/2024, 2:11 PM

This message was deleted.

powerful-librarian-10572

05/03/2024, 4:51 PM

check on the manager side

wonderful-rain-13345

05/03/2024, 5:43 PM

what is the manager?

powerful-librarian-10572

05/03/2024, 6:48 PM

rancher manager

wonderful-rain-13345

05/03/2024, 6:48 PM

oh btw, i'm recovering from a failure of infrastructure. I have a backup of rancher + etcd. I probably have to put etcd in single mode?

powerful-librarian-10572

05/03/2024, 6:49 PM

etcd should recover by itself i think, but do you have any ip change?

wonderful-rain-13345

05/03/2024, 6:49 PM

no i only restored one of the etcd nodes

powerful-librarian-10572

05/03/2024, 6:50 PM

whats the rke2 logs? waiting to apply plan?

wonderful-rain-13345

05/03/2024, 6:56 PM

yep

powerful-librarian-10572

05/03/2024, 6:56 PM

And on the provision log of the cluster?

wonderful-rain-13345

05/03/2024, 6:57 PM

give me a few not by pc. Gonna paste shortly

wonderful-rain-13345

05/03/2024, 6:57 PM

ty you

wonderful-rain-13345

05/03/2024, 9:38 PM

ok, i restored 5 etcd instances. That was the problem. 1 node and it was looking for it's friends 😄 i have etcd+ api on one VM for this reason. so now it creates VMs and shuts them down within a minute

wonderful-rain-13345

05/03/2024, 9:41 PM

yeah it's trying to do old machine sets

wonderful-rain-13345

05/03/2024, 9:42 PM

Deleted the old ones so the new ones would get picked up

wonderful-rain-13345

05/03/2024, 9:43 PM

so i guess that is what it was doing before-- It was waiting for the ETCDs. I suspect it could have been resolved by my ssh'n into the single etcd node and resetting to single node cluster

👍 1

wonderful-rain-13345

05/03/2024, 9:44 PM

I have ninja'd my way out of soooo many DR scenarios with rancher. I could probably offer my Rancher-Spec-Ops skillz to others 😄

wonderful-rain-13345

05/03/2024, 9:44 PM

i owe it to a "robust" backup strategy.

wonderful-rain-13345

05/03/2024, 9:50 PM

gotta see if they come online now. They're pulling images

wonderful-rain-13345

05/03/2024, 9:57 PM

heh, ok why didn't they fail previously tho strange. but ok, I use let's encrypt, but have

privateCA:true

, so setting to false should fix it. because

/var/lib/rancher/agent/rancher2_connection_info.json

has a self-signed CA in it.

wonderful-rain-13345

05/03/2024, 10:07 PM

now it just sits there

wonderful-rain-13345

05/03/2024, 10:11 PM

from capi-controller-manager

Copy code

I0503 22:11:16.314592       1 machine_controller_noderef.go:54] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/prod-api-only-74dfd4db77xbjjqd-x8nsp" namespace="fleet-default" name="prod-api-only-74dfd4db77xbjjqd-x8nsp" reconcileID=20c8d7d6-d476-42bd-8a32-bc2d8055d3e0 MachineSet="fleet-default/prod-api-only-74dfd4db77xbjjqd" MachineDeployment="fleet-default/prod-api-only" Cluster="fleet-default/prod" VmwarevsphereMachine="fleet-default/prod-api-only-c6a53d6d-5kfxh"

wonderful-rain-13345

05/03/2024, 10:11 PM

over and over from diff machines

wonderful-rain-13345

05/03/2024, 10:16 PM

doesn't make sense. why would it not be able to deploy new nodes. It's a literal VM backup of rancher+db + etcds

wonderful-rain-13345

05/03/2024, 10:16 PM

Copy code

9 delivering planSecret prod-bootstrap-template-xwblk-machine-plan with token secret fleet-default/prod-bootstrap-template-xwblk-machine-plan-token-cpfz4 to system-agent

wonderful-rain-13345

05/03/2024, 10:16 PM

few of those

wonderful-rain-13345

05/03/2024, 10:22 PM

ok, 2 of the 5 etcd nodes were unavailable for whatever reason, so i scaled those pools down.

2024/05/03 222201 [INFO] [planner] rkecluster fleet-default/prod: configuring control plane node(s) prod-api-only-74dfd4db77xbjjqd-sggw4,prod-api-only-74dfd4db77xbjjqd-vdhsm,prod-api-only-74dfd4db77xbjjqd-x8nsp

Fri, May 3 2024 62202 pm

So that's happening. Workers are waiting i guess for API servers.

wonderful-rain-13345

05/03/2024, 10:27 PM

I feel like there should be some UI messaging around "be patient, don't change the node pools"

5 Views

Open in Slack

Previous Next