This message was deleted.
# harvester
a
This message was deleted.
a
Are you able to check the longhorn instance manager logs as described in https://docs.harvesterhci.io/v1.4/upgrade/v1-4-0-to-v1-4-1#1-upgrade-is-stuck-in-the-pre-drained-state to see if that's the source of the problem?
g
None in that state, though I did have a volume in a faulted state (seems to have been from a failed backup, that I've now removed)
Interestingly (and I'm assuming because I'm in some sort of midway hinterland between 1.4.0 and 1.4.1) I can no longer launch a VM with PCIe passthrough enabled.
Copy code
Error: failed to generate container "2331f55e1973b307957244af2f5649d1b53ee627830bd9bf7a74a67b476f8706" spec: failed to generate spec: lstat /dev/vfio/1: no such file or directory
a
are you able to generate a support bundle?
g
of course @ambitious-daybreak-95996 - I actually opened a bug on Github for this too
Looks like I'm not the only one - a couple of other folks reporting the same issue https://github.com/harvester/harvester/issues/7457
a
Thanks, reviewing (sorry for the delay)
I've put a comment on the github issue, but to elaborate slightly, there's a flag set when nodes are rebooted after upgrade to verify that they reboot into the new OS version successfully. If that doesn't happen (i.e if the subsequent boot fails somehow), they go into fallback mode (i.e. they boot the previous OS version). Because they come up in the old OS version, this manifests as "Waiting Reboot" state from the perspective of the upgrade.
If you reboot the stuck node now, on the grub screen, it's probably defaulted to "Harvester v1.4.1 (fallback)". What happens if you instead select "Harvester v1.4.1" and boot that? Does it come up OK, or are there errors? (If it comes up OK, the upgrade should proceed...)
g
Will check that today and confirm - thanks Tim!
👍 1
updated the issue on Github, but just in case here is easier:
Copy code
rancher@harvester1:~> cat /proc/cmdline
BOOT_IMAGE=(loop0)/boot/vmlinuz console=tty1 root=LABEL=COS_STATE cos-img/filename=/cOS/passive.img panic=0 net.ifnames=1 rd.cos.oemlabel=COS_OEM rd.cos.mount=LABEL=COS_OEM:/oem rd.cos.mount=LABEL=COS_PERSISTENT:/usr/local rd.cos.oemtimeout=120 audit=1 audit_backlog_limit=8192 intel_iommu=on amd_iommu=on iommu=pt multipath=off upgrade_failure
Fallback is the default boot option, if I select 1.4.1 I get the errors below
Copy code
error: ../../grub-core/kern/fs.c:171:invalid file name `'.
error: ../../grub-core/loader/i386/efi/linux.c:207:you need to load the kernel first.

Press and key to continue...
If I press a key, it reverts to loading the fallback option.
a
Thanks @gray-room-77418, appreciate it. Let's stick to the GH issue for more details (I've added some more there today) - it's better for record keeping and searchability in case anyone else hits this 🙂