adamant-kite-43734
08/03/2024, 1:29 PMworried-state-78253
08/04/2024, 1:42 PMFailed deleting server [fleet-default/web-engine-1-pool1-213e303e-swtv9] of kind (HarvesterMachine) for machine web-engine-1-pool1-5898cb4dcdxbfnxd-xkhrf in infrastructure provider: DeleteError: Downloading driver from <https://rancher.web-engineer/assets/docker-machine-driver-harvester> Doing /etc/rancher/ssl docker-machine-driver-harvester docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped About to remove web-engine-1-pool1-213e303e-swtv9 WARNING: This action will delete both local reference and remote instance. Error removing host "web-engine-1-pool1-213e303e-swtv9": the server has asked for the client to provide credentials (get <http://virtualmachines.kubevirt.io|virtualmachines.kubevirt.io> web-engine-1-pool1-213e303e-swtv9)
Upgrade still stalled…
Hoping to resolve this before the week starts - thinking of rebooting the harvester nodes next, going try looking at these logs more closely first - not sure the best way forward.worried-state-78253
08/04/2024, 1:47 PMworried-state-78253
08/04/2024, 2:02 PMUpgrading System Service
state, from that doc - there is no node showing a certificate issue -
➜ Documents kubectl get <http://clusters.provisioning.cattle.io|clusters.provisioning.cattle.io> local -n fleet-local -o yaml
apiVersion: <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>
kind: Cluster
metadata:
annotations:
<http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
{"apiVersion":"<http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>","kind":"Cluster","metadata":{"annotations":{},"name":"local","namespace":"fleet-local"},"spec":{"kubernetesVersion":"v1.25.9+rke2r1","rkeConfig":{"controlPlaneConfig":{"disable":["rke2-snapshot-controller","rke2-snapshot-controller-crd","rke2-snapshot-validation-webhook"]}}}}
<http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/4yQzU7DMBCEXwXt2Slt079Y4oAQ4sCVF9jYS2Ow15G9CYfK746SVqJC4udo78xovjlBIEGLgqBPgMxRUFzkPD1j+0ZGMskiubgwKOJp4eKts6ChT3F02UV2fKyMH7JQqkwiFAL1ozV+MKXqOL6DhoCMRwrEciUYa3Xz7NjePZwj/8xiDAQafDTo/yXOPZrJAUXB3NdFfnGBsmDoQfPgvQKPLflfR+gwd6Bhu9ztt3XdUGNwc7Crdr9u6jW1y/pg91vb2LXdbHarA6jzYpbSVwho6DCNNIMWBd9Yrtu+eiKpzpeiIPdkpnbzx2Wq+0G6R7Z9dCygT2WSCcpwwciURrJPxJRmZtDLUj4DAAD//5CVWGcAAgAA
<http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: provisioning-cluster-create
<http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=Cluster
<http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: local
<http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: ""
creationTimestamp: "2023-10-27T14:18:54Z"
finalizers:
- <http://wrangler.cattle.io/provisioning-cluster-remove|wrangler.cattle.io/provisioning-cluster-remove>
- <http://wrangler.cattle.io/rke-cluster-remove|wrangler.cattle.io/rke-cluster-remove>
generation: 4
labels:
<http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 50675339e9ca48d1b72932eb038d75d9d2d44618
<http://provider.cattle.io|provider.cattle.io>: harvester
name: local
namespace: fleet-local
resourceVersion: "490972466"
uid: 7aec5041-e2e1-4469-909e-314a62837976
spec:
kubernetesVersion: v1.25.9+rke2r1
localClusterAuthEndpoint: {}
rkeConfig:
chartValues: null
machineGlobalConfig: null
provisionGeneration: 1
upgradeStrategy:
controlPlaneDrainOptions:
deleteEmptyDirData: false
disableEviction: false
enabled: false
force: false
gracePeriod: 0
ignoreDaemonSets: null
postDrainHooks: null
preDrainHooks: null
skipWaitForDeleteTimeoutSeconds: 0
timeout: 0
workerDrainOptions:
deleteEmptyDirData: false
disableEviction: false
enabled: false
force: false
gracePeriod: 0
ignoreDaemonSets: null
postDrainHooks: null
preDrainHooks: null
skipWaitForDeleteTimeoutSeconds: 0
timeout: 0
status:
clientSecretName: local-kubeconfig
clusterName: local
conditions:
- lastUpdateTime: "2023-10-27T14:20:36Z"
message: marking control plane as initialized and ready
reason: Waiting
status: Unknown
type: Ready
- lastUpdateTime: "2023-10-27T14:18:54Z"
status: "False"
type: Reconciling
- lastUpdateTime: "2023-10-27T14:18:54Z"
status: "False"
type: Stalled
- lastUpdateTime: "2023-11-11T11:30:32Z"
status: "True"
type: Created
- lastUpdateTime: "2024-08-02T17:20:28Z"
status: "True"
type: RKECluster
- status: Unknown
type: DefaultProjectCreated
- status: Unknown
type: SystemProjectCreated
- lastUpdateTime: "2023-10-27T14:19:09Z"
status: "True"
type: Connected
- lastUpdateTime: "2024-08-02T10:09:32Z"
status: "True"
type: Updated
- lastUpdateTime: "2024-08-02T10:09:32Z"
status: "True"
type: Provisioned
observedGeneration: 4
ready: true
worried-state-78253
08/04/2024, 2:04 PMWait for cluster settling down...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
Waiting for CAPI cluster fleet-local/local to be provisioned (current phase: Provisioned, current generation: 801608)...
CAPI cluster fleet-local/local is provisioned (current generation: 801610).
<http://cluster.fleet.cattle.io/local|cluster.fleet.cattle.io/local> patched
waiting for fleet-agent creation timestamp to be updated
waiting for fleet-agent creation timestamp to be updated
waiting for fleet-agent creation timestamp to be updated
waiting for fleet-agent creation timestamp to be updated
waiting for fleet-agent creation timestamp to be updated
The waiting for fleet-agent message repeats indefinately.worried-state-78253
08/04/2024, 2:12 PMn1:/ # (
> curl --cacert /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt \
> <https://127.0.0.1:10257/healthz> >/dev/null 2>&1 \
> && echo "[OK] Kube Controller probe" \
> || echo "[FAIL] Kube Controller probe";
>
> curl --cacert /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt \
> <https://127.0.0.1:10259/healthz> >/dev/null 2>&1 \
> && echo "[OK] Scheduler probe" \
> || echo "[FAIL] Scheduler probe";
> )
[OK] Kube Controller probe
[OK] Scheduler probe
However n4/n5 do fail!
n5:/ # (
> curl --cacert /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt \
> <https://127.0.0.1:10257/healthz> >/dev/null 2>&1 \
> && echo "[OK] Kube Controller probe" \
> || echo "[FAIL] Kube Controller probe";
>
> curl --cacert /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt \
> <https://127.0.0.1:10259/healthz> >/dev/null 2>&1 \
> && echo "[OK] Scheduler probe" \
> || echo "[FAIL] Scheduler probe";
> )
[FAIL] Kube Controller probe
[FAIL] Scheduler probe
So looks like this is the issue…worried-state-78253
08/04/2024, 2:19 PMworried-state-78253
08/04/2024, 4:59 PMworried-state-78253
08/04/2024, 5:07 PMprehistoric-balloon-31801
08/05/2024, 6:34 AMworried-state-78253
08/05/2024, 5:26 PMworried-state-78253
08/06/2024, 11:21 AMworried-state-78253
08/08/2024, 4:02 PMworried-state-78253
08/09/2024, 9:50 PMworried-state-78253
08/12/2024, 8:54 AMprehistoric-balloon-31801
08/13/2024, 3:03 AMworried-state-78253
08/13/2024, 9:08 AMworried-state-78253
08/13/2024, 9:08 AMworried-state-78253
08/13/2024, 9:14 AM