This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

10/26/2022, 12:19 PM

This message was deleted.

ancient-pizza-13099

10/26/2022, 12:37 PM

@damp-crayon-64796 How about the resource of your each NODE, cpu cores, memory and storage. If cpu cores < 12, it may not be enough to deal with v1.0.3+upgrade peak resource requirements.

ancient-pizza-13099

10/26/2022, 12:37 PM

cc @prehistoric-balloon-31801 @red-king-19196 @ancient-pilot-51731

ancient-pizza-13099

10/26/2022, 12:39 PM

@damp-crayon-64796 could you help log an issue in github https://github.com/harvester/harvester/issues , and add the support-bundle filehttps://docs.harvesterhci.io/v1.0/troubleshooting/harvester/#generate-a-support-bundle thanks.

damp-crayon-64796

10/26/2022, 12:40 PM

2 CPUs with 10 cores each harvester1:/usr/local # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2

ancient-pizza-13099

10/26/2022, 12:43 PM

uptime

in harvester1 please

damp-crayon-64796

10/26/2022, 12:44 PM

harvester1:/usr/local # uptime 124359 up 21 days 6:23, 1 user, load average: 1.10, 1.42, 1.20

ancient-pizza-13099

10/26/2022, 12:48 PM

harveste1

is not rebooted yet.

ancient-pizza-13099

10/26/2022, 12:49 PM

In the log, there should be many occurances of

virtctl start upgrade-repo-hvst-upgrade-xpf52 -n harvester-system

, is the

not found

returned from the very beginning ? or after certain time

ancient-pizza-13099

10/26/2022, 12:50 PM

maybe the pod

upgrade-repo-hvst-upgrade-xpf52

was removed after enough retry-times

damp-crayon-64796

10/26/2022, 12:51 PM

directly at the beginning the messages appeared

damp-crayon-64796

10/26/2022, 12:51 PM

k get pod NAME READY STATUS RESTARTS AGE harvester-859d59d7c4-9nlk8 0/1 Pending 0 145m harvester-859d59d7c4-csrdf 1/1 Running 0 153m harvester-859d59d7c4-dlw2q 1/1 Running 0 153m harvester-load-balancer-686957bdfc-pktm6 1/1 Running 2 (21d ago) 35d harvester-network-controller-22hnv 1/1 Running 0 153m harvester-network-controller-4h8xc 1/1 Running 1 (143m ago) 153m harvester-network-controller-cpskp 1/1 Running 0 153m harvester-network-controller-manager-7f56fd5d45-f2l45 1/1 Running 0 153m harvester-network-controller-manager-7f56fd5d45-lq8vn 1/1 Running 0 145m harvester-network-webhook-57f74f7568-lm4wr 1/1 Running 0 153m harvester-node-disk-manager-5kl9p 1/1 Running 0 153m harvester-node-disk-manager-kdfkx 1/1 Running 0 153m harvester-node-disk-manager-kwt94 1/1 Running 1 (143m ago) 153m harvester-node-manager-ng4fm 1/1 Running 0 153m harvester-node-manager-q74g5 1/1 Running 0 153m harvester-node-manager-rhhxq 1/1 Running 1 (143m ago) 153m harvester-webhook-ff874d44-8z748 1/1 Running 0 153m harvester-webhook-ff874d44-mzn5t 1/1 Running 0 153m harvester-webhook-ff874d44-v589t 0/1 Pending 0 145m hvst-upgrade-xpf52-post-drain-harvester1-527kb 0/1 Error 0 143m hvst-upgrade-xpf52-post-drain-harvester1-7qj7f 0/1 Error 0 141m hvst-upgrade-xpf52-post-drain-harvester1-cht6v 0/1 Error 0 141m hvst-upgrade-xpf52-post-drain-harvester1-czfbw 0/1 Error 0 141m hvst-upgrade-xpf52-post-drain-harvester1-lgptq 1/1 Running 0 141m hvst-upgrade-xpf52-post-drain-harvester1-tngv7 0/1 Error 0 141m hvst-upgrade-xpf52-post-drain-harvester1-v4t89 0/1 Error 0 141m hvst-upgrade-xpf52-post-drain-harvester1-w9hsk 0/1 Error 0 141m kube-vip-4spln 1/1 Running 17 (143m ago) 72d kube-vip-5g2w7 1/1 Running 8 (21d ago) 72d kube-vip-cloud-provider-0 1/1 Running 7 (21d ago) 71d kube-vip-vsbhd 1/1 Running 11 (21d ago) 72d virt-api-77cdfbf56f-x7hhf 1/1 Running 0 146m virt-api-77cdfbf56f-zvvsc 1/1 Running 0 150m virt-controller-657f55f68c-586kp 1/1 Running 0 151m virt-controller-657f55f68c-fbkdx 1/1 Running 0 145m virt-handler-g89tl 1/1 Running 1 (143m ago) 152m virt-handler-nzvjj 1/1 Running 0 151m virt-handler-x5mns 1/1 Running 0 152m virt-operator-c6ff785d7-924gn 1/1 Running 0 153m

ancient-pizza-13099

10/26/2022, 12:52 PM

are there any error log in

hvst-upgrade-xpf52-post-drain-harvester1-lgptq

damp-crayon-64796

10/26/2022, 12:53 PM

+++ dirname /usr/local/bin/upgrade_node.sh ++ cd /usr/local/bin ++ pwd + SCRIPT_DIR=/usr/local/bin + source /usr/local/bin/lib.sh ++ UPGRADE_NAMESPACE=harvester-system ++ UPGRADE_REPO_URL=http://upgrade-repo-hvst-upgrade-xpf52.harvester-system/harvester-iso ++ UPGRADE_REPO_VM_NAME=upgrade-repo-hvst-upgrade-xpf52 ++ UPGRADE_REPO_RELEASE_FILE=http://upgrade-repo-hvst-upgrade-xpf52.harvester-system/harvester-iso/harvester-release.yaml ++ UPGRADE_REPO_SQUASHFS_IMAGE=http://upgrade-repo-hvst-upgrade-xpf52.harvester-system/harvester-iso/rootfs.squashfs ++ UPGRADE_REPO_BUNDLE_ROOT=http://upgrade-repo-hvst-upgrade-xpf52.harvester-system/harvester-iso/bundle ++ UPGRADE_REPO_BUNDLE_METADATA=http://upgrade-repo-hvst-upgrade-xpf52.harvester-system/harvester-iso/bundle/metadata.yaml ++ CACHED_BUNDLE_METADATA= ++ HOST_DIR=/host + UPGRADE_TMP_DIR=/host/usr/local/upgrade_tmp + mkdir -p /host/usr/local/upgrade_tmp + case $1 in + command_post_drain + wait_repo ++ get_repo_vm_status ++ kubectl get virtualmachines.kubevirt.io upgrade-repo-hvst-upgrade-xpf52 -n harvester-system '-o=jsonpath={.status.printableStatus}' Error from server (NotFound): virtualmachines.kubevirt.io "upgrade-repo-hvst-upgrade-xpf52" not found + [[ '' == \R\u\n\n\i\n\g ]] + echo 'Try to bring up the upgrade repo VM...' + virtctl start upgrade-repo-hvst-upgrade-xpf52 -n harvester-system Try to bring up the upgrade repo VM... Error starting VirtualMachine virtualmachine.kubevirt.io "upgrade-repo-hvst-upgrade-xpf52" not found + true + sleep 10 ... and this will loop

ancient-pizza-13099

10/26/2022, 12:55 PM

kubectl get vm -A kubectl get vmi -A

damp-crayon-64796

10/26/2022, 12:56 PM

[skoch@mgmt-sk ~]$ kubectl get vm -A NAMESPACE NAME AGE STATUS READY default hci-k3s-cluster-k3s-179bbfac-ln7jz 21d Stopped False default hci-k3s-cluster-k3s-179bbfac-nrkwz 21d Stopped False default hci-k3s-cluster-k3s-179bbfac-pd6bx 21d Stopped False default hci-rke2-cluster-control-258c3ef0-g5cmh 21d Stopped False default hci-rke2-cluster-control-258c3ef0-qmd6t 21d Stopped False default hci-rke2-cluster-control-258c3ef0-qqrc2 21d Stopped False default hci-rke2-cluster-worker-c76869e8-b9t9s 21d Stopped False default hci-rke2-cluster-worker-c76869e8-csrqj 21d Stopped False default hci-rke2-cluster-worker-c76869e8-kn8pp 21d Stopped False default hci-rke2-cluster-worker-c76869e8-tr24g 21d Stopped False default hci-rke2-cluster-worker-c76869e8-xfcbj 21d Stopped False default rocky9-installed 72d Stopped False default test-efi 48d Stopped False default test-rocky8 72d Stopped False [skoch@mgmt-sk ~]$ kubectl get vmi -A No resources found

ancient-pizza-13099

10/26/2022, 12:57 PM

the upgrade vm

upgrade-repo-hvst-upgrade-xpf52

is totally gone, tricky

✔️ 1

damp-crayon-64796

10/26/2022, 12:57 PM

Support bundle .... too big for github

ancient-pizza-13099

10/26/2022, 1:04 PM

I will spend some time to look into the support bundle file.

ancient-pizza-13099

10/26/2022, 1:39 PM

The most important message is

logs/harvester-system/harvester-859d59d7c4-dlw2q/apiserver.log:2022-10-26T10:29:18.422203593Z time="2022-10-26T10:29:18Z" level=info msg="Delete upgrade repo VM harvester-system/upgrade-repo-hvst-upgrade-xpf52"

ancient-pizza-13099

10/26/2022, 1:40 PM

The upgrade repo is deleted.

ancient-pizza-13099

10/26/2022, 1:48 PM

Seems the upgrade is time-outed, and the

repo

is finally actively

deleted

Copy code

yamls/namespaced/harvester-system/harvesterhci.io/v1beta1/upgrades.yaml
    creationTimestamp: "2022-10-26T09:41:08Z"
..
    - lastUpdateTime: "2022-10-26T10:29:18Z"
      message: Job has reached the specified backoff limit
      reason: BackoffLimitExceeded
      status: "False"
      type: NodesUpgraded

ancient-pizza-13099

10/26/2022, 1:49 PM

@damp-crayon-64796 how about the hardware of you cluster NODEs (cpu cores, memory, storage (ssd, nvme or hdd ? ))

ancient-pizza-13099

10/26/2022, 1:54 PM

Copy code

status:
    conditions:
    - lastUpdateTime: "2022-10-26T10:29:18Z"
      status: "False"
      type: Completed
    - lastUpdateTime: "2022-10-26T09:46:47Z"
      status: "True"
      type: ImageReady
    - lastUpdateTime: "2022-10-26T09:49:32Z"
      status: "True"
      type: RepoReady
    - lastUpdateTime: "2022-10-26T10:13:28Z"
      status: "True"
      type: NodesPrepared
    - lastUpdateTime: "2022-10-26T10:23:42Z"
      status: "True"
      type: SystemServicesUpgraded
    - lastUpdateTime: "2022-10-26T10:29:18Z"
      message: Job has reached the specified backoff limit
      reason: BackoffLimitExceeded
      status: "False"
      type: NodesUpgraded

damp-crayon-64796

10/26/2022, 2:11 PM

First of all: Many thanks for your inverstigations !! The upgrade job timed out, i agree, but i did also not see the VMs during the retries before. So I think it was just never created/downloaded ?? I suspect some fw/network topic in our envinment, because we do not allow every traffic to gou outsite. My Hardware (it is for demo only) are 3 blade wird 2x10cores. 256G RAM and Fast FC storage attached. I dont think it is the HW..... I saw in the Status Page i posted above Download Upgrade Image = 0% ..... is this correct ?

prehistoric-balloon-31801

10/26/2022, 2:12 PM

The VM and image are deleted after an upgrade finish (either succeed or fail).

ancient-pizza-13099

10/26/2022, 2:14 PM

@damp-crayon-64796 If ISO download fail, the upgrade won't continue.

ancient-pizza-13099

10/26/2022, 2:15 PM

@prehistoric-balloon-31801, when

10:29

the post-drain job try to bring up

vm

, the

repo vm

has already been deleted, due to

Job has reached the specified backoff limit

ancient-pizza-13099

10/26/2022, 2:15 PM

previous jobs are all in error

ancient-pizza-13099

10/26/2022, 2:17 PM

And until

10:23

, the

virt-api

POD is restarted with

upgraded version

prehistoric-balloon-31801

10/26/2022, 2:17 PM

yes, the controller is buggy, and mark the upgrade as fail when the job fail at the first time. (So VM is gone)

prehistoric-balloon-31801

10/26/2022, 2:19 PM

@damp-crayon-64796 what’s the disk size, I saw it’s FC stoarge. Could you also check

/var/log/containers

see if you can find the first error job:

sudo ls /var/log/containers | grep post-drain

damp-crayon-64796

10/26/2022, 2:27 PM

yes, it is FC storage, not really supported, i know. Disk is nearly 6TB in size for the VMs and 120G for the OS

damp-crayon-64796

10/26/2022, 2:29 PM

the var/log/containers directory shows an log file hvst-upgrade-xpf52-post-drain-harvester1-lgptq_harvester-system_apply-a9f297ed721dff961079244c4b814e61600ac30152ad3bb78218d217869cbb89.log with the content i was psted above where it loops the messages : 2022-10-26T142936.063803761Z stderr F Error from server (NotFound): virtualmachines.kubevirt.io "upgrade-repo-hvst-upgrade-xpf52" not found 2022-10-26T142936.068506651Z stderr F + [[ '' == \R\u\n\n\i\n\g ]] 2022-10-26T142936.068556416Z stderr F + echo 'Try to bring up the upgrade repo VM...' 2022-10-26T142936.068567017Z stderr F + virtctl start upgrade-repo-hvst-upgrade-xpf52 -n harvester-system 2022-10-26T142936.06851447Z stdout F Try to bring up the upgrade repo VM... 2022-10-26T142936.123610906Z stderr F Error starting VirtualMachine virtualmachine.kubevirt.io "upgrade-repo-hvst-upgrade-xpf52" not found 2022-10-26T142936.125633106Z stderr F + true 2022-10-26T142936.12567131Z stderr F + sleep 10

👌 1

prehistoric-balloon-31801

10/26/2022, 2:30 PM

Could you check the free space?

df -h

damp-crayon-64796

10/26/2022, 2:31 PM

harvester1:/usr/local # df -h|grep -v ^overlay Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 126G 0 126G 0% /dev/shm tmpfs 51G 15M 51G 1% /run tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup /dev/sdg3 15G 4.4G 9.7G 31% /run/initramfs/cos-state /dev/loop0 3.0G 1.3G 1.6G 45% / tmpfs 63G 12M 63G 1% /run/overlay /dev/sdg2 58M 1.5M 53M 3% /oem /dev/sdg5 95G 55G 36G 61% /usr/local tmpfs 126G 4.0K 126G 1% /tmp /dev/sdh 5.9T 272G 5.3T 5% /var/lib/harvester/defaultdisk tmpfs 1.0G 12K 1.0G 1% /var/lib/kubelet/pods/77b23e02-d9b4-4d5d-9aad-0c638d3e6253/volumes/kubernetes.io~projected/kube-api-access-sllbw tmpfs 252G 12K 252G 1% /var/lib/kubelet/pods/a9014a20-9f2a-45ff-981e-4a5c790cffec/volumes/kubernetes.io~projected/kube-api-access-qcnvx tmpfs 252G 12K 252G 1% /var/lib/kubelet/pods/c00bcf3e-a35d-4035-8f1d-0cf7a6d32c95/volumes/kubernetes.io~projected/kube-api-access-wxpjw ...

prehistoric-balloon-31801

10/26/2022, 2:34 PM

I saw some OOM message in dmesg, not quite sure if that’s related.

ancient-pizza-13099

10/26/2022, 2:53 PM

OOM looks to be related to ``rancher-logging-root-fluentd-0`` per keyword in kernal OOM

task_memcg=/kubepods/burstable/pod6e088e0d-9998-430f-b332-d8679b11d825

and

ruby

kubelet.log:I1026 10:22:49.113438    2922 reconciler.go:225] "operationExecutor.VerifyControllerAttachedVolume started for volume \"app-config\" (UniqueName: \"<http://kubernetes.io/secret/6e088e0d-9998-430f-b332-d8679b11d825-app-config|kubernetes.io/secret/6e088e0d-9998-430f-b332-d8679b11d825-app-config>\") pod \"rancher-logging-root-fluentd-0\" (UID: \"6e088e0d-9998-430f-b332-d8679b11d825\") "

ancient-pizza-13099

10/26/2022, 2:55 PM

Did not find the possibility that POD may go into

Error

in following code

ancient-pizza-13099

10/26/2022, 3:06 PM

@prehistoric-balloon-31801 failure of first POD

prehistoric-balloon-31801

10/26/2022, 3:07 PM

I’m also checking this. Any suspicious thing?

ancient-pizza-13099

10/26/2022, 3:08 PM

But it does not report again

No retries permitted until 2022-10-26 10:27:04.815090863 +0000 UTC m=+32.013847439 (durationBeforeRetry 4s)

2022-10-26 10:27:04.815090863

ancient-pizza-13099

10/26/2022, 3:09 PM

the running POD

ancient-pizza-13099

10/26/2022, 3:11 PM

other failed one:

ancient-pizza-13099

10/26/2022, 3:12 PM

maybe the first POD, due to occasional failure, kubelet mark it as failure

ancient-pizza-13099

10/26/2022, 3:13 PM

All ohters are after

hvst-upgrade-xpf52-post-drain-harvester1-527kb

, which is the first one failed.

ancient-pizza-13099

10/26/2022, 3:22 PM

pod

hvst-upgrade-xpf52-post-drain-harvester1-527kb

failed in 2 minutes, caused the first job try-out failure

Copy code

status:
    conditions:
    - lastProbeTime: "null"
      lastTransitionTime: "2022-10-26T10:26:35Z"
      status: "True"
      type: Initialized
    - lastProbeTime: "null"
      lastTransitionTime: "2022-10-26T10:28:34Z"
      reason: PodFailed
      status: "False"
      type: Ready
    - lastProbeTime: "null"
      lastTransitionTime: "2022-10-26T10:28:34Z"
      reason: PodFailed
      status: "False"
      type: ContainersReady
    - lastProbeTime: "null"
      lastTransitionTime: "2022-10-26T10:26:35Z"
      status: "True"
      type: PodScheduled

prehistoric-balloon-31801

10/26/2022, 3:29 PM

@damp-crayon-64796 Could you list

ls /var/lib/rancher/rke2/agent/images/

on harvester1? Thanks

damp-crayon-64796

10/26/2022, 3:29 PM

harvester1:~ # ls /var/lib/rancher/rke2/agent/images/ cloud-controller-manager-image.txt etcd-image.txt kube-apiserver-image.txt kube-controller-manager-image.txt kube-proxy-image.txt kube-scheduler-image.txt

👍 1

prehistoric-balloon-31801

10/26/2022, 4:19 PM

@ancient-pizza-13099 With this output, we can confirm the

clean_rke2_archives

function is complete.

ancient-pizza-13099

10/26/2022, 6:52 PM

@prehistoric-balloon-31801 do you mean, the first

post-drain POD

has ever run, and failed in some other places ?

ancient-pizza-13099

10/26/2022, 7:58 PM

Copy code

/yamls/cluster/v1/nodes.yaml

harvester1

    taints:
    - effect: NoSchedule
      key: <http://node.kubernetes.io/unschedulable|node.kubernetes.io/unschedulable>
      timeAdded: "2022-10-26T10:23:42Z"
    unschedulable: true

ancient-pizza-13099

10/27/2022, 9:55 AM

between the pod running, there is log of network interrupt

damp-crayon-64796

10/27/2022, 10:26 AM

my network configuration is very basic, just 2 Nics in an active/backup config for all. Mgmt Nw untagged and the VM-network use some tagged vlans

ancient-pizza-13099

10/27/2022, 10:27 AM

no worry, this

flannel

network may affect the internal communication between

pods

, and the

post-drain

pod may fail due to this interrupt, we are checking.

damp-crayon-64796

10/27/2022, 10:54 AM

had an look into the log files at

harvester1:/var/log/pods/kube-system_rke2-canal-vqct4_5a7cdf6a-11b7-4539-b6b4-93d7541d57cb/kube-flannel:

here an excerpt

Copy code

2022-10-05T06:21:48.58830961Z stderr F I1005 06:21:48.588233       1 iptables.go:243] Adding iptables rule: ! -s 10.52.0.0/16 -d 10.52.0.0/16 -m comment --comment flanneld masq -j MASQUERADE --random-fully
2022-10-26T10:26:35.376490188Z stderr F I1026 10:26:35.376360       1 watch.go:39] context canceled, close receiver chan
2022-10-26T10:26:35.376521477Z stderr F I1026 10:26:35.376406       1 vxlan_network.go:75] evts chan closed
2022-10-26T10:26:35.376533744Z stderr F I1026 10:26:35.376449       1 main.go:438] shutdownHandler sent cancel signal...
2022-10-26T10:26:35.376620616Z stderr F W1026 10:26:35.376562       1 reflector.go:436] <http://github.com/flannel-io/flannel/subnet/kube/kube.go:379|github.com/flannel-io/flannel/subnet/kube/kube.go:379>: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
2022-10-26T10:26:35.376648476Z stderr F I1026 10:26:35.376622       1 main.go:394] Exiting cleanly...

ancient-pizza-13099

10/27/2022, 11:04 AM

please try in your

harvester1

Copy code

ls /var/log/pods/harvester-system_hvst-upgrade-

``````

ancient-pizza-13099

10/27/2022, 11:04 AM

check

if hvst-upgrade-xpf52-post-drain-harvester1-527kb  if there

ancient-pizza-13099

10/27/2022, 11:05 AM

when exist, please attach all the log files under it

damp-crayon-64796

10/27/2022, 11:12 AM

Copy code

harvester1:/var/log/pods # ls
cattle-logging-system_rancher-logging-kube-audit-fluentbit-fh8sm_17e69f4b-95ce-4a4e-b7db-4feb8863757d
cattle-logging-system_rancher-logging-rke2-journald-aggregator-w5g52_9b83dd80-78b0-4336-bf92-ab05460b63a9
cattle-logging-system_rancher-logging-root-fluentbit-plkl8_e837a608-82b4-4642-bdec-7dd0d3cc3106
cattle-monitoring-system_rancher-monitoring-prometheus-node-exporter-zclfn_5ee9d172-73f4-433d-9858-81160d8985d1
cattle-system_system-upgrade-controller-7b8d94c7f5-pss5c_813190f4-6aeb-4b89-aa2d-81a1931f2ddc
harvester-system_harvester-network-controller-4h8xc_6f212eb7-8f62-406c-b113-cc8cfb2fa568
harvester-system_harvester-node-disk-manager-kwt94_115d0c21-edf0-4447-bd41-a78a8345c1ba
harvester-system_harvester-node-manager-rhhxq_40182858-4341-49be-a310-978810512c0f
harvester-system_hvst-upgrade-xpf52-post-drain-harvester1-lgptq_87ece495-1d23-47e6-886f-493ed16bf7c5
harvester-system_kube-vip-4spln_c00bcf3e-a35d-4035-8f1d-0cf7a6d32c95
harvester-system_virt-handler-g89tl_10792870-90c3-4655-8c52-8d754ac78148
kube-system_cloud-controller-manager-harvester1_1a84611ed06607ed8a51e65d936a6ff0
kube-system_etcd-harvester1_e18aa5e5b83a5a3c56d78e4054612394
kube-system_harvester-whereabouts-hqrcz_86497db2-cc43-4ff2-9801-f193e063b713
kube-system_kube-apiserver-harvester1_4874a08227e8932676b83ca998a390f3
kube-system_kube-controller-manager-harvester1_57585a0305e4e46df816ebab263926f3
kube-system_kube-proxy-harvester1_ce051fc91f9e463593a1d45efa60be52
kube-system_kube-scheduler-harvester1_2495d4d1888db1561e78ccbc2ff8677c
kube-system_rke2-canal-vqct4_5a7cdf6a-11b7-4539-b6b4-93d7541d57cb
kube-system_rke2-ingress-nginx-controller-ftfrf_3b272fd8-3e35-415c-9dbe-f585a7664341
kube-system_rke2-multus-ds-dh54w_77b23e02-d9b4-4d5d-9aad-0c638d3e6253
longhorn-system_backing-image-manager-d7ad-1dd5_93309371-8572-49ac-83eb-5fe4c7ef8466
longhorn-system_engine-image-ei-a5371358-ln9kp_6649d1c4-4ee4-4ad5-a27f-496420038bc1
longhorn-system_longhorn-csi-plugin-rcp7s_ec1c39c5-1f76-4571-8b9f-1bbaf85ce7ef
longhorn-system_longhorn-loop-device-cleaner-l2dsb_a9014a20-9f2a-45ff-981e-4a5c790cffec
longhorn-system_longhorn-manager-2clvr_f11f95c1-aa23-42d3-b6cd-927fcefd2b1a
harvester1:/var/log/pods # cd *527*
-bash: cd: *527*: No such file or directory
harvester1:/var/log/pods # cd /var/log/pods/harvester-system_hvst-upgrade-xpf52-post-drain-harvester1-lgptq_87ece495-1d23-47e6-886f-493ed16bf7c5
harvester1:/var/log/pods/harvester-system_hvst-upgrade-xpf52-post-drain-harvester1-lgptq_87ece495-1d23-47e6-886f-493ed16bf7c5 # ls
apply
harvester1:/var/log/pods/harvester-system_hvst-upgrade-xpf52-post-drain-harvester1-lgptq_87ece495-1d23-47e6-886f-493ed16bf7c5 # cd apply/
harvester1:/var/log/pods/harvester-system_hvst-upgrade-xpf52-post-drain-harvester1-lgptq_87ece495-1d23-47e6-886f-493ed16bf7c5/apply # ls -l
total 7964
-rw-r----- 1 root root 8146988 Oct 27 11:07 0.log
harvester1:/var/log/pods/harvester-system_hvst-upgrade-xpf52-post-drain-harvester1-lgptq_87ece495-1d23-47e6-886f-493ed16bf7c5/apply #

damp-crayon-64796

10/27/2022, 11:12 AM

and the log file ...

0.log

ancient-pizza-13099

10/27/2022, 11:25 AM

thanks, let me check it now

ancient-pizza-13099

10/27/2022, 11:28 AM

it is not from

hvst-upgrade-xpf52-post-drain-harvester1-527kb

, which is deleted by kubelet after certain time.

ancient-pizza-13099

10/27/2022, 11:28 AM

it is from

hvst-upgrade-xpf52-post-drain-harvester1-lgptq

, which is report

Error starting VirtualMachine <http://virtualmachine.kubevirt.io|virtualmachine.kubevirt.io> "upgrade-repo-hvst-upgrade-xpf52" not found

damp-crayon-64796

10/27/2022, 11:30 AM

yes, sorry would have better told, but the *527kb directory is not there

ancient-pizza-13099

10/27/2022, 12:13 PM

no worry, that ever existed pod

*527kb

was deleted by kubelet after some time

ancient-pizza-13099

10/28/2022, 9:45 AM

@damp-crayon-64796 could you help in

harvester1

systemctl status upgrade-reboot

ancient-pizza-13099

10/28/2022, 9:45 AM

ls /tmp -alth && ls /tmp/upgrade-reboot.sh -alth

damp-crayon-64796

10/28/2022, 9:47 AM

sure here it is

Copy code

harvester1:~ # systemctl status upgrade-reboot
Unit upgrade-reboot.service could not be found.
harvester1:~ # ls /tmp -alth   &&  ls /tmp/upgrade-reboot.sh -alth
total 7.8M
drwxrwxrwt  23 root root  500 Oct 28 09:46 .
-rw-rw-rw-   1 root root 7.8M Oct 27 11:09 0.log
drwx------   2 root root   40 Oct 26 10:29 cachepkgs2350779609
drwx------   2 root root   40 Oct 26 10:29 tmp.Nm8lk73Xg1
drwx------   2 root root   40 Oct 26 10:29 cachepkgs1058812872
drwx------   2 root root   40 Oct 26 10:29 tmp.2uPxVXnnpx
drwx------   2 root root   40 Oct 26 10:29 cachepkgs967314055
drwx------   2 root root   40 Oct 26 10:29 tmp.7kdhisOixl
drwx------   2 root root   40 Oct 26 10:28 cachepkgs3212023921
drwx------   2 root root   40 Oct 26 10:28 tmp.UqPedlw8GW
drwx------   2 root root   40 Oct 26 10:28 cachepkgs2819240199
drwx------   2 root root   40 Oct 26 10:28 tmp.rvjd0nWTRf
drwx------   2 root root   40 Oct 26 10:28 cachepkgs75555247
drwx------   2 root root   40 Oct 26 10:28 tmp.OHVPN6zY1D
drwx------   2 root root   40 Oct 26 10:28 cachepkgs2296860756
drwx------   2 root root   40 Oct 26 10:28 tmp.sSSPqEeQ9y
-rw-------   1 root root  635 Oct 26 10:15 tmp.Fuqfa2nmkW
drwx------   3 root root   60 Oct  5 06:21 systemd-private-a20bc825d3c141dcbde9d255416be891-systemd-logind.service-QRxVQg
drwx------   3 root root   60 Oct  5 06:20 systemd-private-a20bc825d3c141dcbde9d255416be891-systemd-timesyncd.service-w7uxPh
drwxrwxrwt   2 root root   40 Oct  5 06:20 .ICE-unix
drwxrwxrwt   2 root root   40 Oct  5 06:20 .Test-unix
drwxrwxrwt   2 root root   40 Oct  5 06:20 .X11-unix
drwxrwxrwt   2 root root   40 Oct  5 06:20 .XIM-unix
drwxrwxrwt   2 root root   40 Oct  5 06:20 .font-unix
drwxr-xr-x. 22 root root 4.0K Aug 14 17:55 ..

ancient-pizza-13099

10/28/2022, 9:49 AM

could you help ls -alt those tmp.* , they are created at `10.26 10:28 and 10:29"

ancient-pizza-13099

10/28/2022, 9:51 AM

from tmp.sSSPqEeQ9y to cachepkgs2350779609

damp-crayon-64796

10/28/2022, 9:52 AM

nothing in there:

Copy code

harvester1:/tmp # ls -alt tmp.*
-rw------- 1 root root 635 Oct 26 10:15 tmp.Fuqfa2nmkW

tmp.Nm8lk73Xg1:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:29 .

tmp.2uPxVXnnpx:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:29 .

tmp.7kdhisOixl:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:29 .

tmp.UqPedlw8GW:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:28 .

tmp.rvjd0nWTRf:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:28 .

tmp.OHVPN6zY1D:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:28 .

tmp.sSSPqEeQ9y:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:51 ..
drwx------  2 root root  40 Oct 26 10:28 .
harvester1:/tmp #

ancient-pizza-13099

10/28/2022, 9:54 AM

ls -alth cache*

ancient-pizza-13099

10/28/2022, 9:55 AM

and: cat /etc/mtab

damp-crayon-64796

10/28/2022, 9:55 AM

Copy code

harvester1:/tmp # ls -alth cache*
cachepkgs2350779609:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:29 .

cachepkgs1058812872:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:29 .

cachepkgs967314055:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:29 .

cachepkgs3212023921:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:28 .

cachepkgs2819240199:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:28 .

cachepkgs75555247:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:28 .

cachepkgs2296860756:
total 0
drwxrwxrwt 23 root root 500 Oct 28 09:55 ..
drwx------  2 root root  40 Oct 26 10:28 .
harvester1:/tmp #

damp-crayon-64796

10/28/2022, 9:57 AM

Copy code

cat mtab:

mtab.out

ancient-pizza-13099

10/28/2022, 10:00 AM

truncated, please upload via a file, thank

damp-crayon-64796

10/28/2022, 10:01 AM

i think it was a file ... at the bottom is a link "see it in full" or what i am doing wrong?

ancient-pizza-13099

10/28/2022, 10:01 AM

between Oct 26 10:28 and 10:29, some shell script in the

post-drain

return 1, caused failue, we are checking which cmd

ancient-pizza-13099

10/28/2022, 10:02 AM

OK, click and get a full of it, thanks

ancient-pizza-13099

10/28/2022, 10:12 AM

cat /etc/os-release

damp-crayon-64796

10/28/2022, 10:13 AM

Copy code

harvester1:/tmp # cat /etc/os-release
NAME="SLE Micro"
VERSION="5.2"
VERSION_ID="5.2"
PRETTY_NAME="Harvester v1.0.3"
ID="sle-micro-rancher"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sle-micro-rancher:5.2"
VARIANT="Harvester"
VARIANT_ID="Harvester-20220802"
GRUB_ENTRY_NAME="Harvester v1.0.3"

ancient-pizza-13099

10/28/2022, 10:14 AM

ok, still v1.0.3

ancient-pizza-13099

10/28/2022, 10:15 AM

please

ls -alth /usr/local/upgrade_tmp

damp-crayon-64796

10/28/2022, 10:15 AM

Copy code

harvester1:/tmp # ls -alth /usr/local/upgrade_tmp/
total 3.4G
drwxr-xr-x 10 root root 4.0K Oct 26 10:29 ..
-rw-------  1 root root 492M Oct 26 10:29 tmp.9gJwps92DB
drwxr-xr-x  2 root root 4.0K Oct 26 10:29 .
-rw-------  1 root root 492M Oct 26 10:29 tmp.SG1NQ0bSVk
-rw-------  1 root root 492M Oct 26 10:29 tmp.W8Ou1bUAuE
-rw-------  1 root root 492M Oct 26 10:28 tmp.M4QyyFvaoR
-rw-------  1 root root 492M Oct 26 10:28 tmp.GlhwlNnHcS
-rw-------  1 root root 492M Oct 26 10:28 tmp.nLxl4leBp1
-rw-------  1 root root 492M Oct 26 10:28 tmp.cGa78ygm56

ancient-pizza-13099

10/28/2022, 10:20 AM

@prehistoric-balloon-31801 The failure happens between

mount and rm

ancient-pizza-13099

10/28/2022, 10:20 AM

tmp_rootfs_squashfs file is existing in /usr/local/upgrade_tmp, with

492M

size

ancient-pizza-13099

10/28/2022, 10:21 AM

maybe

umount $tmp_rootfs_mount

fail with

ancient-pizza-13099

10/28/2022, 10:22 AM

rm -rf $tmp_rootfs_squashfs

fail, that seems not possible ?

ancient-pizza-13099

10/28/2022, 10:23 AM

chroot $HOST_DIR elemental upgrade --directory ${tmp_rootfs_mount#"$HOST_DIR"}

fail with

damp-crayon-64796

10/28/2022, 10:25 AM

I tried also do some research and wondering if there not should be an VlanConfig from the convert routine ... but may be wrong

Copy code

[skoch@mgmt-sk ~]$ k get VlanConfig
No resources found

ancient-pizza-13099

10/28/2022, 10:27 AM

or failed with

mount $tmp_rootfs_squashfs $tmp_rootfs_mount

ancient-pizza-13099

10/28/2022, 10:27 AM

/etc/mtab

has no record of those

tmp.*

ancient-pizza-13099

10/28/2022, 10:28 AM

@damp-crayon-64796 your case seems not to be related with

VlanConfig

✔️ 1

ancient-pizza-13099

10/28/2022, 10:29 AM

or your

tmpfs

is full

ancient-pizza-13099

10/28/2022, 10:29 AM

please

df -H | grep tmpfs

damp-crayon-64796

10/28/2022, 10:29 AM

Copy code

harvester1:/tmp # df -H | grep tmpfs
devtmpfs        4.2M     0  4.2M   0% /dev
tmpfs           136G     0  136G   0% /dev/shm
tmpfs            55G   17M   55G   1% /run
tmpfs           4.2M     0  4.2M   0% /sys/fs/cgroup
tmpfs            68G   13M   68G   1% /run/overlay
tmpfs           136G  8.2M  136G   1% /tmp
tmpfs           1.1G   13k  1.1G   1% /var/lib/kubelet/pods/77b23e02-d9b4-4d5d-9aad-0c638d3e6253/volumes/kubernetes.io~projected/kube-api-access-sllbw
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/a9014a20-9f2a-45ff-981e-4a5c790cffec/volumes/kubernetes.io~projected/kube-api-access-qcnvx
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/c00bcf3e-a35d-4035-8f1d-0cf7a6d32c95/volumes/kubernetes.io~projected/kube-api-access-wxpjw
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/5a7cdf6a-11b7-4539-b6b4-93d7541d57cb/volumes/kubernetes.io~projected/kube-api-access-lht29
tmpfs           135M   13k  135M   1% /var/lib/kubelet/pods/40182858-4341-49be-a310-978810512c0f/volumes/kubernetes.io~projected/kube-api-access-58kdr
tmpfs           210M   13k  210M   1% /var/lib/kubelet/pods/86497db2-cc43-4ff2-9801-f193e063b713/volumes/kubernetes.io~projected/kube-api-access-dvrlb
tmpfs           135M   13k  135M   1% /var/lib/kubelet/pods/6f212eb7-8f62-406c-b113-cc8cfb2fa568/volumes/kubernetes.io~projected/kube-api-access-bl4pv
tmpfs           271G     0  271G   0% /var/lib/kubelet/pods/f11f95c1-aa23-42d3-b6cd-927fcefd2b1a/volumes/kubernetes.io~secret/longhorn-grpc-tls
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/f11f95c1-aa23-42d3-b6cd-927fcefd2b1a/volumes/kubernetes.io~projected/kube-api-access-hjfpw
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/115d0c21-edf0-4447-bd41-a78a8345c1ba/volumes/kubernetes.io~projected/kube-api-access-4jbh6
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/6649d1c4-4ee4-4ad5-a27f-496420038bc1/volumes/kubernetes.io~projected/kube-api-access-bhhj4
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/ec1c39c5-1f76-4571-8b9f-1bbaf85ce7ef/volumes/kubernetes.io~projected/kube-api-access-cdk7p
tmpfs           271G  8.2k  271G   1% /var/lib/kubelet/pods/10792870-90c3-4655-8c52-8d754ac78148/volumes/kubernetes.io~secret/kubevirt-virt-handler-server-certs
tmpfs           271G  8.2k  271G   1% /var/lib/kubelet/pods/10792870-90c3-4655-8c52-8d754ac78148/volumes/kubernetes.io~secret/kubevirt-virt-handler-certs
tmpfs           271G  4.1k  271G   1% /var/lib/kubelet/pods/10792870-90c3-4655-8c52-8d754ac78148/volumes/kubernetes.io~downward-api/podinfo
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/10792870-90c3-4655-8c52-8d754ac78148/volumes/kubernetes.io~projected/kube-api-access-zx4px
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/9b83dd80-78b0-4336-bf92-ab05460b63a9/volumes/kubernetes.io~projected/kube-api-access-q2t4t
tmpfs           210M  4.1k  210M   1% /var/lib/kubelet/pods/17e69f4b-95ce-4a4e-b7db-4feb8863757d/volumes/kubernetes.io~secret/config
tmpfs           210M  4.1k  210M   1% /var/lib/kubelet/pods/e837a608-82b4-4642-bdec-7dd0d3cc3106/volumes/kubernetes.io~secret/config
tmpfs           210M   13k  210M   1% /var/lib/kubelet/pods/17e69f4b-95ce-4a4e-b7db-4feb8863757d/volumes/kubernetes.io~projected/kube-api-access-xdlvm
tmpfs           210M   13k  210M   1% /var/lib/kubelet/pods/e837a608-82b4-4642-bdec-7dd0d3cc3106/volumes/kubernetes.io~projected/kube-api-access-wksts
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/813190f4-6aeb-4b89-aa2d-81a1931f2ddc/volumes/kubernetes.io~projected/kube-api-access-zdtlt
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/93309371-8572-49ac-83eb-5fe4c7ef8466/volumes/kubernetes.io~projected/kube-api-access-4tlcf
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/87ece495-1d23-47e6-886f-493ed16bf7c5/volumes/kubernetes.io~projected/kube-api-access-tkzcm
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/3b272fd8-3e35-415c-9dbe-f585a7664341/volumes/kubernetes.io~secret/webhook-cert
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/3b272fd8-3e35-415c-9dbe-f585a7664341/volumes/kubernetes.io~projected/kube-api-access-472cx
tmpfs           271G     0  271G   0% /var/lib/kubelet/pods/d0d8d47c-69c8-4b48-9702-232e92febcdb/volumes/kubernetes.io~secret/longhorn-grpc-tls
tmpfs           271G     0  271G   0% /var/lib/kubelet/pods/da4c88fc-5c77-4816-a7e4-dd793bf78a3c/volumes/kubernetes.io~secret/longhorn-grpc-tls
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/daec1f31-6c1d-4f9e-aa7b-89ac59a74b0d/volumes/kubernetes.io~projected/kube-api-access-gtmjk
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/bcd59922-8cfb-4ab6-880e-1567b47ac988/volumes/kubernetes.io~projected/kube-api-access-8cvln
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/37a10d03-68a8-42ac-a49c-93966d90ab92/volumes/kubernetes.io~projected/kube-api-access-lsm4q
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/d0d8d47c-69c8-4b48-9702-232e92febcdb/volumes/kubernetes.io~projected/kube-api-access-lhgzs
tmpfs           271G   13k  271G   1% /var/lib/kubelet/pods/da4c88fc-5c77-4816-a7e4-dd793bf78a3c/volumes/kubernetes.io~projected/kube-api-access-qk54g
tmpfs            28G     0   28G   0% /run/user/0

ancient-pizza-13099

10/28/2022, 10:31 AM

please try

Copy code

rm -rf /usr/local/upgrade_tmp/tmp.9gJwps92DB
ls -alth /usr/local/upgrade_tmp/

damp-crayon-64796

10/28/2022, 10:31 AM

Copy code

harvester1:/tmp #  rm -rf /usr/local/upgrade_tmp/tmp.9gJwps92DB
harvester1:/tmp # ls -alth /usr/local/upgrade_tmp/
total 2.9G
drwxr-xr-x  2 root root 4.0K Oct 28 10:31 .
drwxr-xr-x 10 root root 4.0K Oct 26 10:29 ..
-rw-------  1 root root 492M Oct 26 10:29 tmp.SG1NQ0bSVk
-rw-------  1 root root 492M Oct 26 10:29 tmp.W8Ou1bUAuE
-rw-------  1 root root 492M Oct 26 10:28 tmp.M4QyyFvaoR
-rw-------  1 root root 492M Oct 26 10:28 tmp.GlhwlNnHcS
-rw-------  1 root root 492M Oct 26 10:28 tmp.nLxl4leBp1
-rw-------  1 root root 492M Oct 26 10:28 tmp.cGa78ygm56

ancient-pizza-13099

10/28/2022, 10:32 AM

mkdir /tmp/tmprootfs_1 mount /usr/local/upgrade_tmp/tmp.SG1NQ0bSVk /tmp/tmprootfs_1 umount /tmp/tmprootfs_1

ancient-pizza-13099

10/28/2022, 10:32 AM

let's try to mount and umount

damp-crayon-64796

10/28/2022, 10:34 AM

done this without any issue on the host OR should i do it in the post_drain pod?

ancient-pizza-13099

10/28/2022, 10:36 AM

maybe

chroot $HOST_DIR elemental upgrade --directory ${tmp_rootfs_mount#"$HOST_DIR"}

the true upgrade itself makes fs can not be

umount

rm

fail

ancient-pizza-13099

10/28/2022, 10:37 AM

let me figure out how to start a new job to do those commands

ancient-pizza-13099

10/28/2022, 10:46 AM

journalctl -k | grep mount

ancient-pizza-13099

10/28/2022, 10:46 AM

check kernal message , we just did manually mount

ancient-pizza-13099

10/28/2022, 10:47 AM

for support-bundle, it get

Oct 26 10:28:33 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)

damp-crayon-64796

10/28/2022, 10:48 AM

Copy code

Oct 05 06:20:56 harvester1 systemd[1]: sysroot-oem.mount: Succeeded.
Oct 05 06:20:56 harvester1 systemd[1]: sysroot-var.mount: Succeeded.
Oct 05 06:20:56 harvester1 systemd[1]: sysroot.mount: Succeeded.
Oct 05 06:20:58 harvester1 kernel: EXT4-fs (sdh): mounted filesystem with ordered data mode. Opts: (null)
Oct 05 06:25:59 harvester1 kernel: EXT4-fs (sdj): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:28:33 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:28:40 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:28:47 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:28:54 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:29:01 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:29:08 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
Oct 26 10:29:15 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)

ancient-pizza-13099

10/28/2022, 10:51 AM

df -H | grep sd

damp-crayon-64796

10/28/2022, 10:51 AM

Copy code

harvester1:/tmp # df -H  | grep sd
/dev/sdg3        16G  4.7G   11G  31% /run/initramfs/cos-state
/dev/sdg2        61M  1.6M   55M   3% /oem
/dev/sdg5       102G   59G   39G  61% /usr/local
/dev/sdh        6.5T  346G  5.8T   6% /var/lib/harvester/defaultdisk

ancient-pizza-13099

10/28/2022, 10:54 AM

Then those

Oct 26 10:28:33 harvester1 kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)

logs are caused by this line

chroot $HOST_DIR elemental upgrade --directory ${tmp_rootfs_mount#"$HOST_DIR"}

prehistoric-balloon-31801

10/28/2022, 1:38 PM

@damp-crayon-64796 Could you check if they are all identical files? :

sha256sum /usr/local/upgrade_tmp/*

damp-crayon-64796

10/28/2022, 1:40 PM

Could check this later in the evening

prehistoric-balloon-31801

10/28/2022, 1:40 PM

@ancient-pizza-13099 If they are identical, it means jobs are actually retrying to download those squash image files and we might have non-idempotent codes (umount/mount) or

elmental upgrade

exit 1. Prove your suspicion.

prehistoric-balloon-31801

10/28/2022, 1:41 PM

It’s not hurry and thanks for being with us!

ancient-pizza-13099

10/28/2022, 2:22 PM

I log https://github.com/harvester/harvester/issues/3070 to track this issue.

❤️ 1

ancient-pizza-13099

10/28/2022, 4:06 PM

please also post the full text of

df -H

, thanks

damp-crayon-64796

10/28/2022, 4:12 PM

sha256sum is all the same

Copy code

rancher@harvester1:/usr/local/upgrade_tmp> sudo sha256sum *
23130ebe608ae968cc1346c630fe5079148aba8e8420ccf82b559ef6a8b72b51  tmp.GlhwlNnHcS
23130ebe608ae968cc1346c630fe5079148aba8e8420ccf82b559ef6a8b72b51  tmp.M4QyyFvaoR
23130ebe608ae968cc1346c630fe5079148aba8e8420ccf82b559ef6a8b72b51  tmp.SG1NQ0bSVk
23130ebe608ae968cc1346c630fe5079148aba8e8420ccf82b559ef6a8b72b51  tmp.W8Ou1bUAuE
23130ebe608ae968cc1346c630fe5079148aba8e8420ccf82b559ef6a8b72b51  tmp.cGa78ygm56
23130ebe608ae968cc1346c630fe5079148aba8e8420ccf82b559ef6a8b72b51  tmp.nLxl4leBp1

damp-crayon-64796

10/28/2022, 4:13 PM

df -h

Copy code

rancher@harvester1:/usr/local/upgrade_tmp> df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           126G     0  126G   0% /dev/shm
tmpfs            51G   16M   51G   1% /run
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/sdg3        15G  4.4G  9.7G  31% /run/initramfs/cos-state
/dev/loop0      3.0G  1.3G  1.6G  45% /
tmpfs            63G   12M   63G   1% /run/overlay
overlay          63G   12M   63G   1% /boot
overlay          63G   12M   63G   1% /etc
/dev/sdg2        58M  1.5M   53M   3% /oem
overlay          63G   12M   63G   1% /srv
/dev/sdg5        95G   55G   36G  61% /usr/local
overlay          63G   12M   63G   1% /var
tmpfs           126G  7.8M  126G   1% /tmp
/dev/sdh        5.9T  323G  5.3T   6% /var/lib/harvester/defaultdisk
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/b15f579314aa85d0fca77774c99544d1ce16663210246480686bafb6c46efb37/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/43d29b69c82c29c5e9f97053d9488e0b92e86fe9003a71d6fcf0b765ffce176f/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/e8402c94c6d3050431bed960802a740c9018ad58c630bcf80d7adf79ac4da00c/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/5108a0bec9afcfbd8300f247166aa14904df44cba18feee2b7c2e8dd3815a239/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/41d6b24c399d5d9b790f684d4e42226b225b61d90e4e95ec4e6440be2740f5ee/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/0ab5febe5d5d47e2ae647794d3c5f9c90f29b11f52e64c11df8267daac3ccaa5/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/e7646afb2cc8f7bc1ce8d85980eb0f711fb39a4876168b23e896604efe6877d8/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/c2e5a62b3cc7bb107b97a03b420c20e986036a8a73e005fb42c82f9bef82b1bd/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/d725dc2fcd0c2b142a0c05cd9785b1a95d5d6ebbf30f6413565f34cef3964616/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/1eb0d3f179445d3e91940f30bf0cce091e386daa20541bae73b1573ea615a520/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/60e4d4a83aa0aea2be75d23b24ea746a4eac8b178ccde87fe1fb2364b7faf310/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/a619f96c3a2f25a1ec0e7b8338fbf4819be3bd60d7fb6c884a3b3f61db0238da/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/727702241eba838d8025ce1007e68ff5a43504111486d0ea4848eaaa278d5d9e/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/2782eba80cec6102e140a10741e8414e49bc621f6cb8df8fe7a92d0dabc0bc9a/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/ccb164434563f8710c23f7e7a56e2436c26ff6c586bb206cdb3bf7218d58a480/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/e5385bcc3f459e1f2ab2ea6ca808d1f7a51c58cc236180534a5c73fc1e8eebc3/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/4ce8768cce6791a48183bffebf238efe50d51e931c1666d1b83f2135e4c4c69d/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/8cdd58778d8210f875e149e35284f61e3656f393257096dbb5d0a0e4f8e0984f/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/1febf6fb150c0b58492204e392d7efecfc4c659dba6772265f6965775ffd148c/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/ac38a6e848d1fdba34878508e9243af926ed04aee15ca45c5a4da3455d3e82a9/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/aa4161a4fd2d1fdfe531b8fd7a98bc3e538c59d25657c2faf091b10169079f27/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/aa44b0577e238dedc45556072133c7be31aad5e2c50a2e7bae0d84e31b681170/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/8ff3d1e46c98b96014ee73d0733f6494702fbcba2ce75b2c8a176e6662b5a4a6/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/aa07d70417dda4c3661446b31e1e522049194dbf38e88b37ef086e1493b1feb3/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/81687f56475268f15d7cb874f7e44d7b266368e94ef93836a1598a9f5ff373ee/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/228f6bdabb4bbeb79d7957a5da41facdbbf2c9cefb75d4828fd0e0b6d139e712/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/3077eef294e359afbd57246098a9cd0459f2f2ef4c0da9a7627fe23b5b9b0e2c/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/9c21db0deb8cfebe279f051c782bae946f8bf22b27fc4f02158d653aacedb656/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/d8e1a1e9d865de92a5301ccaee6aeeb95797ac824e3615f096174004f541687c/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/6db76a72de018607fd9844ac2dadf8860a4aa5ebcc42b46f0dbc4cd8e49df70b/shm
rancher@harvester1:/usr/local/upgrade_tmp>

damp-crayon-64796

10/28/2022, 4:16 PM

for the github issue .... it is a 3 node cluster

Copy code

[skoch@mgmt-sk ~]$ k get no -o wide
NAME         STATUS   ROLES                       AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION                CONTAINER-RUNTIME
harvester1   Ready    control-plane,etcd,master   74d   v1.24.7+rke2r1    10.1.35.91    <none>        Harvester v1.0.3   5.3.18-150300.59.87-default   <containerd://1.6.8-k3s1>
harvester3   Ready    control-plane,etcd,master   74d   v1.22.12+rke2r1   10.1.35.93    <none>        Harvester v1.0.3   5.3.18-150300.59.87-default   <containerd://1.5.13-k3s1>
harvester4   Ready    control-plane,etcd,master   74d   v1.22.12+rke2r1   10.1.35.94    <none>        Harvester v1.0.3   5.3.18-150300.59.87-default   <containerd://1.5.13-k3s1>

i uncordened the failed node

✔️ 1

damp-crayon-64796

10/28/2022, 4:24 PM

saw your last comment in the issue, therefore:

Copy code

harvester1:~ # lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0    7:0    0    3G  1 loop /
sda      8:0    1 29.9G  0 disk
sdb      8:16   0  120G  0 disk
├─sdb1   8:17   0   64M  0 part
├─sdb2   8:18   0   64M  0 part
├─sdb3   8:19   0   15G  0 part
├─sdb4   8:20   0    8G  0 part
└─sdb5   8:21   0 96.9G  0 part
sdc      8:32   0  5.9T  0 disk
sdd      8:48   0  120G  0 disk
├─sdd1   8:49   0   64M  0 part
├─sdd2   8:50   0   64M  0 part
├─sdd3   8:51   0   15G  0 part
├─sdd4   8:52   0    8G  0 part
└─sdd5   8:53   0 96.9G  0 part
sde      8:64   0  120G  0 disk
├─sde1   8:65   0   64M  0 part
├─sde2   8:66   0   64M  0 part
├─sde3   8:67   0   15G  0 part
├─sde4   8:68   0    8G  0 part
└─sde5   8:69   0 96.9G  0 part
sdf      8:80   0  5.9T  0 disk
sdg      8:96   0  120G  0 disk
├─sdg1   8:97   0   64M  0 part
├─sdg2   8:98   0   64M  0 part /oem
├─sdg3   8:99   0   15G  0 part /run/initramfs/cos-state
├─sdg4   8:100  0    8G  0 part
└─sdg5   8:101  0 96.9G  0 part /usr/local
sdh      8:112  0  5.9T  0 disk /var/lib/harvester/defaultdisk
sdi      8:128  0  5.9T  0 disk
sdk      8:160  0   40G  0 disk
└─sdk1   8:161  0   40G  0 part

what is a bit tricky is that we have FC storage and the disks are seen multiple times, but mulitpath isnt configured on the OS, so you see each disks 4 times

Copy code

harvester1:~ # lsblk| grep ^sd | grep 5.9
sdc      8:32   0  5.9T  0 disk
sdf      8:80   0  5.9T  0 disk
sdh      8:112  0  5.9T  0 disk /var/lib/harvester/defaultdisk
sdi      8:128  0  5.9T  0 disk
harvester1:~ # lsblk| grep ^sd | grep 120
sdb      8:16   0  120G  0 disk
sdd      8:48   0  120G  0 disk
sde      8:64   0  120G  0 disk
sdg      8:96   0  120G  0 disk

ancient-pizza-13099

10/28/2022, 6:08 PM

thanks. the

disks

are tricky, we are checking if thus will cause

elemental upgrade

fail

ancient-pizza-13099

10/28/2022, 6:37 PM

lsblk -o NAME,LABEL,PARTLABEL

please

damp-crayon-64796

10/28/2022, 6:40 PM

Copy code

harvester1:~ # lsblk -o NAME,LABEL,PARTLABEL
NAME   LABEL           PARTLABEL
loop0  COS_ACTIVE
sda
sdb
├─sdb1 COS_GRUB        p.grub
├─sdb2 COS_OEM         p.oem
├─sdb3 COS_STATE       p.state
├─sdb4 COS_RECOVERY    p.recovery
└─sdb5 COS_PERSISTENT  p.persistent
sdc    HARV_LH_DEFAULT
sdd
├─sdd1 COS_GRUB        p.grub
├─sdd2 COS_OEM         p.oem
├─sdd3 COS_STATE       p.state
├─sdd4 COS_RECOVERY    p.recovery
└─sdd5 COS_PERSISTENT  p.persistent
sde
├─sde1 COS_GRUB        p.grub
├─sde2 COS_OEM         p.oem
├─sde3 COS_STATE       p.state
├─sde4 COS_RECOVERY    p.recovery
└─sde5 COS_PERSISTENT  p.persistent
sdf    HARV_LH_DEFAULT
sdg
├─sdg1 COS_GRUB        p.grub
├─sdg2 COS_OEM         p.oem
├─sdg3 COS_STATE       p.state
├─sdg4 COS_RECOVERY    p.recovery
└─sdg5 COS_PERSISTENT  p.persistent
sdh    HARV_LH_DEFAULT
sdi    HARV_LH_DEFAULT
sdk
└─sdk1

👍 1

ancient-pizza-13099

10/28/2022, 6:55 PM

@prehistoric-balloon-31801 Is it possible for us to manually start a job to run only those few lines of shell code, to trigger the

elemental upgrade

with those existing tmp file ?

Copy code

tmp_rootfs_mount=$(mktemp -d -p $HOST_DIR/tmp)
  mount $tmp_rootfs_squashfs $tmp_rootfs_mount

  chroot $HOST_DIR elemental upgrade --directory ${tmp_rootfs_mount#"$HOST_DIR"}
  umount $tmp_rootfs_mount
  rm -rf $tmp_rootfs_squashfs

  umount -R /run

damp-crayon-64796

10/28/2022, 7:28 PM

if we set

export rootfs_squashfs=/host/usr/local/upgrade_tmp/tmp.nLxl4leBp1

and

HOST_DIR=/host

i think we could execute this commands in the post_drain pod which is still running. Do you think this work ?

ancient-pizza-13099

10/28/2022, 7:34 PM

The curretnt post-drain POD will be blocked in waiting repo-vm

ancient-pizza-13099

10/28/2022, 7:34 PM

we need to hack the pod, to start directly from the desired code block

ancient-pizza-13099

10/28/2022, 7:40 PM

We will do some tests to make sure it works, and then try in your

harvester

🙂

prehistoric-balloon-31801

10/31/2022, 2:05 AM

We can test that command (by backing up the current os image):

Copy code

# backup
cp /run/initramfs/cos-state/cOS/active.img /usr/local/active.img.bak

# upgrade
mkdir /tmp/new_root
mount /usr/local/upgrade_tmp/tmp.GlhwlNnHcS /tmp/new_root
elemental upgrade --directory /tmp/new_root

prehistoric-balloon-31801

10/31/2022, 2:38 AM

I added another disk with the exact layout to a running system and did the upgrade and it indeed cause issues. elemental command tries to mount the first COS_STATE partition.

damp-crayon-64796

11/01/2022, 12:54 PM

thanks a lot. So what i could do is to disable 3 of the 4 pathes to the disk and then do the above commands. Not sure if i would need an reboot, but perhaps it will work. If this would work then I just need to know how to proceed with node 2+3. What do you think ?

prehistoric-balloon-31801

11/02/2022, 9:19 AM

Hi Stephan, do you know why there are multiple disks with identical partitions? In fact, this might not be a good idea because the booted system is uncertain, it could run into a wrong system or even mount a wrong persistent partition.

damp-crayon-64796

11/02/2022, 9:20 AM

this are not multiple disks, it is one disk which is seen through different pathes

prehistoric-balloon-31801

11/02/2022, 9:21 AM

Got it, thanks. Is it possible to disable other paths? Harvester can’t support multipath disks at this moment.

damp-crayon-64796

11/02/2022, 9:22 AM

yes, thats what i suggested above

prehistoric-balloon-31801

11/02/2022, 9:22 AM

(We’ll work a fix to choose the right partition, but still it’s better not to have so many paths)

damp-crayon-64796

11/02/2022, 9:24 AM

yes, understand, if/when you want to support for fibre channel disks i think you must allow to use multipath daemon and use /dev/mapper-Volumes instead of /dev/sd-Devices

damp-crayon-64796

11/02/2022, 9:24 AM

Do you think i should disable the pathes and then try the elemental upgrade as you suggested above?

prehistoric-balloon-31801

11/02/2022, 9:29 AM

You can disable the paths and restart the upgrade again. (can’t “resume” in the middle). I’ll write a brief procedure how to do that.

✔️ 1

prehistoric-balloon-31801

11/02/2022, 9:52 AM

here it is: https://gist.github.com/bk201/8a2b274772301de93a9ebfb936ffcc35

❤️ 1

damp-crayon-64796

11/02/2022, 10:03 AM

I am at step2 and unsure: You want me to delete the red marked lines ? Right? and not in the annotations ? Ans what is meant with the post-drain hooks ?

prehistoric-balloon-31801

11/02/2022, 10:04 AM

I think your configuration looks good and don’t need to do anything. what’s the cluster state?

Copy code

kubectl get <http://clusters.cluster.x-k8s.io|clusters.cluster.x-k8s.io> local -n fleet-local

damp-crayon-64796

11/02/2022, 10:05 AM

Copy code

[skoch@mgmt-sk ~]$ kubectl get <http://clusters.cluster.x-k8s.io|clusters.cluster.x-k8s.io> local -n fleet-local
NAME    PHASE          AGE   VERSION
local   Provisioning   79d

prehistoric-balloon-31801

11/02/2022, 10:07 AM

Could you run the “./drain-status.sh” script, it can help determine current state.

prehistoric-balloon-31801

11/02/2022, 10:10 AM

You can skip “2. Edit cluster and remove pre-drain and post-drain hooks.“, I updated the gist too.

damp-crayon-64796

11/02/2022, 10:18 AM

Copy code

[skoch@mgmt-sk harv_up]$ . ./drain-status.sh

harvester1 (custom-6a3a7673cfe4)
  rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  rke-post-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  harvester-post-hook: null

harvester3 (custom-7a17ad8fa75f)
  rke-pre-drain: null
  harvester-pre-hook null
  rke-post-drain: null
  harvester-post-hook: null

harvester4 (custom-ef4fd4a88161)
  rke-pre-drain: null
  harvester-pre-hook null
  rke-post-drain: null
  harvester-post-hook: null

prehistoric-balloon-31801

11/02/2022, 10:18 AM

You can just do

./post-drain.sh harvester1

prehistoric-balloon-31801

11/02/2022, 10:19 AM

and the cluster should back to “Provisioned” later

damp-crayon-64796

11/02/2022, 10:21 AM

Copy code

[skoch@mgmt-sk harv_up]$ . ./post-drain.sh harvester1
<http://harvester.cattle.io/post-hook|harvester.cattle.io/post-hook>: '{"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"<http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>"}],"preDrainHooks":[{"annotation":"<http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}'
secret/custom-6a3a7673cfe4-machine-plan annotated
[skoch@mgmt-sk harv_up]$ kubectl get <http://clusters.cluster.x-k8s.io|clusters.cluster.x-k8s.io> local -n fleet-local
NAME    PHASE          AGE   VERSION
local   Provisioning   79d

damp-crayon-64796

11/02/2022, 10:30 AM

It took some time

Copy code

[skoch@mgmt-sk harv_up]$ kubectl get <http://clusters.cluster.x-k8s.io|clusters.cluster.x-k8s.io> local -n fleet-local
NAME    PHASE         AGE   VERSION
local   Provisioned   79d

Just in the Gui the state is still failed

damp-crayon-64796

11/02/2022, 10:31 AM

should i follow now the procedure at "#start-over-an-upgrade" ?

prehistoric-balloon-31801

11/02/2022, 10:42 AM

Exactly

damp-crayon-64796

11/02/2022, 10:46 AM

okay, did this, and the Upgrade button in the gui disappeared now. I think as the cluster is already on 1.1.0 the upgrade will not be offered anymore (?)

prehistoric-balloon-31801

11/02/2022, 10:52 AM

Try apply this file again: https://releases.rancher.com/harvester/v1.1.0/version.yaml

damp-crayon-64796

11/02/2022, 10:57 AM

unfortunatly does not appear

Copy code

[skoch@mgmt-sk harv_up]$ k apply -f <https://releases.rancher.com/harvester/v1.1.0/version.yaml>
<http://version.harvesterhci.io/v1.1.0|version.harvesterhci.io/v1.1.0> created
[skoch@mgmt-sk harv_up]$ k get <http://version.harvesterhci.io/v1.1.0|version.harvesterhci.io/v1.1.0>
NAME     ISO-URL                                                                    RELEASEDATE   MINUPGRADABLEVERSION
v1.1.0   <https://releases.rancher.com/harvester/v1.1.0/harvester-v1.1.0-amd64.iso>   20221025

damp-crayon-64796

11/02/2022, 11:01 AM

sorry, looked at the wrong place. see it now

damp-crayon-64796

11/02/2022, 11:52 AM

first host succeeded now, but the second lost his ip during rebot. node is up and i think upgraded, but has no network

damp-crayon-64796

11/02/2022, 12:39 PM

i had to copy and adjusted these files from first node: ifcfg-mgmt-bo and ifcfg-mgmt-br and did an ifup mgmt-bo The files were missing at all. Then the upgrade succeed. Could you help me to make these files permanent please ?

prehistoric-balloon-31801

11/02/2022, 2:12 PM

I think you might hit this: https://github.com/harvester/harvester/issues/3045 Do you have bonding with multiple NICs?

prehistoric-balloon-31801

11/02/2022, 2:12 PM

Workaround is: https://github.com/harvester/harvester/issues/3045#issuecomment-1292685394

damp-crayon-64796

11/02/2022, 2:55 PM

yes, indeed, this fixed the issue THANKS so much to both of you for helping here and figuring out that the issue was on my side trying an configuration which isnt supposed to work. Great experience to work with you.

👍 1

💯 1

60 Views

Open in Slack

Previous Next