adamant-kite-43734
06/12/2023, 1:24 PMbig-judge-33880
06/12/2023, 1:37 PMhvst-upgrade-mrs26-post-drain-har-01 1/1 64s 170m
hvst-upgrade-mrs26-post-drain-har-02 1/1 74s 91m
hvst-upgrade-mrs26-post-drain-har-04 0/1 46m 47m
(1.1.1 to 1.1.2 upgrade)big-judge-33880
06/12/2023, 2:19 PM[rke2-ingress-nginx-controller-2sb97] 2023/06/12 14:18:05 [error] 452#452: *2850162 upstream prematurely closed connection while sending to client, client: 10.52.3.0, server: _, request: "GET /v1/harvester/supportbundles/bundle-it7ia/download HTTP/2.0", upstream: "<http://10.52.5.169:80/v1/harvester/supportbundles/bundle-it7ia/download>
big-judge-33880
06/12/2023, 2:26 PMJun 12 12:50:21 har-04 rke2[106121]: time="2023-06-12T12:50:21Z" level=info msg="Labels and annotations have been set successfully on node: har-04"
Jun 12 12:50:45 har-04 rke2[108737]: time="2023-06-12T12:50:45Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1c70cd2_b314_492c_9552_275313f6cc38.slice/cri-containerd-a9374d4010b19827f941dd9ca10b20322b75483c8627ec2c4af57b1d7f8e1ea6.scope: device or resource busy"
Jun 12 12:50:45 har-04 rke2[108737]: time="2023-06-12T12:50:45Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/blkio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1c70cd2_b314_492c_9552_275313f6cc38.slice/cri-containerd-a9374d4010b19827f941dd9ca10b20322b75483c8627ec2c4af57b1d7f8e1ea6.scope: device or resource busy"
Jun 12 12:50:45 har-04 rke2[108737]: time="2023-06-12T12:50:45Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1c70cd2_b314_492c_9552_275313f6cc38.slice/cri-containerd-a9374d4010b19827f941dd9ca10b20322b75483c8627ec2c4af57b1d7f8e1ea6.scope: device or resource busy"
job startTime: "2023-06-12T12:50:01Z"
big-judge-33880
06/12/2023, 2:44 PMbig-judge-33880
06/12/2023, 2:49 PMERRO[2023-06-12T14:43:14Z] Failed to move /run/initramfs/cos-state/cOS/active.img to /run/initramfs/cos-state/cOS/passive.img: exit status 1
big-judge-33880
06/12/2023, 3:14 PMINFO[2023-06-12T15:13:45Z] Moving /run/initramfs/cos-state/cOS/active.img to /run/initramfs/cos-state/cOS/passive.img
INFO[2023-06-12T15:13:46Z] Finished moving /run/initramfs/cos-state/cOS/active.img to /run/initramfs/cos-state/cOS/passive.img
INFO[2023-06-12T15:13:46Z] Moving /run/initramfs/cos-state/cOS/transition.img to /run/initramfs/cos-state/cOS/active.img
INFO[2023-06-12T15:13:46Z] Finished moving /run/initramfs/cos-state/cOS/transition.img to /run/initramfs/cos-state/cOS/active.img
INFO[2023-06-12T15:13:46Z] Applying 'after-upgrade' hook
INFO[2023-06-12T15:13:46Z] Running after-upgrade hook
INFO[2023-06-12T15:13:46Z] Upgrade completed
ERRO[2023-06-12T15:13:46Z] Failed mounting device /dev/sda3 with label COS_STATE
big-judge-33880
06/12/2023, 3:37 PM/tmp/skip-retry-with-succeed
in the pod since it seemed everything that needs to be done was already done
Then ran these commands on the host (with pod_name being the pod I forced into success above)
HARVESTER_UPGRADE_POD_NAME=hvst-upgrade-mrs26-post-drain-har-04-ckl2t
cat > /tmp/upgrade-reboot.sh << EOF
#!/bin/bash -ex
HARVESTER_UPGRADE_POD_NAME=$HARVESTER_UPGRADE_POD_NAME
EOF
cat >> /tmp/upgrade-reboot.sh << 'EOF'
source /etc/bash.bashrc.local
pod_id=$(crictl pods --name $HARVESTER_UPGRADE_POD_NAME --namespace harvester-system -o json | jq -er '.items[0].id')
# get `upgrade` container ID
container_id=$(crictl ps --pod $pod_id --name apply -o json -a | jq -er '.containers[0].id')
container_state=$(crictl inspect $container_id | jq -er '.status.state')
if [ "$container_state" = "CONTAINER_EXITED" ]; then
container_exit_code=$(crictl inspect $container_id | jq -r '.status.exitCode')
if [ "$container_exit_code" = "0" ]; then
sleep 10
# workaround for <https://github.com/harvester/harvester/issues/2865>
# kubelet could start from old manifest first and generate a new manifest later.
rm -f /var/lib/rancher/rke2/agent/pod-manifests/*
reboot
exit 0
fi
fi
exit 1
EOF
chmod +x /tmp/upgrade-reboot.sh
cat > /run/systemd/system/upgrade-reboot.service << 'EOF'
[Unit]
Description=Upgrade reboot
[Service]
Type=simple
ExecStart=/tmp/upgrade-reboot.sh
Restart=always
RestartSec=10
EOF
systemctl daemon-reload
systemctl start upgrade-reboot
big-judge-33880
06/12/2023, 3:55 PMbig-judge-33880
06/12/2023, 3:56 PMbig-judge-33880
06/12/2023, 3:58 PMnodeStatuses:
har-01:
state: Succeeded
har-02:
state: Succeeded
har-03:
state: Images preloaded
har-04:
state: Succeeded
har-05:
state: Images preloaded
har-06:
state: Images preloaded
previousVersion: v1.1.1
repoInfo: |
release:
harvester: v1.1.2
harvesterChart: 1.1.2
os: Harvester v1.1.2
kubernetes: v1.24.11+rke2r1
rancher: v2.6.11
monitoringChart: 100.1.0+up19.0.3
minUpgradableVersion: v1.1.0
the node in question being har-03big-judge-33880
06/12/2023, 4:01 PMbig-judge-33880
06/12/2023, 4:09 PMbig-judge-33880
06/12/2023, 7:34 PMgreat-bear-19718
06/12/2023, 11:23 PMbig-judge-33880
06/13/2023, 7:38 AMbig-judge-33880
06/13/2023, 7:45 AMbig-judge-33880
06/13/2023, 8:53 PM<http://rke.cattle.io/pre-drain|rke.cattle.io/pre-drain>: '{"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"ignoreErrors":false,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}'
, then post-drain job by annotating using <http://rke.cattle.io/post-drain|rke.cattle.io/post-drain>: '{"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"ignoreErrors":false,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}'
, which brings the node to Successful status, rebooted to 1.1.2 as far as the upgrade crd is concernedbig-judge-33880
06/13/2023, 8:59 PM<http://rke.cattle.io/drain-done|rke.cattle.io/drain-done>: '{"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"ignoreErrors":false,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}'
<http://rke.cattle.io/drain-options|rke.cattle.io/drain-options>: '{"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"ignoreErrors":false,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}'