This message was deleted.
# harvester
a
This message was deleted.
g
there should be a launcher pod associated with the vm, you should be able to access it using kubectl as well
l
cool ill try to check the logs of those
Yeah so I guess after the launcher pod is automatically recreated the logs from the crashed pod are deleted by rke2. Seems the only way to see them are to be tailing the logs during the crash or to install a logging operator/collector into the harvester cluster. Am I missing something more easily accomplished? Can rke2 be configured to allow the logs to persist in /var/log/containers after a crash?
I am able to capture the compute container pod's logs during a crash, but no informative information is given.
Copy code
2022-07-12T18:22:57.218835334Z stderr F {"component":"virt-launcher","level":"error","msg":"internal error: End of file from agent socket","pos":"qemuAgentIO:602","subcomponent":"libvirt","thread":"76","timestamp":"2022-07-12T18:22:57.218000Z"}
2022-07-12T18:22:57.679150987Z stderr F {"component":"virt-launcher","level":"info","msg":"Reaped pid 75 with status 9","pos":"virt-launcher.go:549","timestamp":"2022-07-12T18:22:57.678981Z"}
2022-07-12T18:22:57.708639164Z stderr F {"component":"virt-launcher","level":"info","msg":"Reaped pid 95 with status 0","pos":"virt-launcher.go:549","timestamp":"2022-07-12T18:22:57.708373Z"}
2022-07-12T18:22:58.038900649Z stderr F {"component":"virt-launcher","level":"info","msg":"Process 12743c20-a26d-5ada-adf3-38c121b486e9 and pid 75 is gone!","pos":"monitor.go:148","timestamp":"2022-07-12T18:22:58.038699Z"}
2022-07-12T18:22:58.038947755Z stderr F {"component":"virt-launcher","level":"info","msg":"Waiting on final notifications to be sent to virt-handler.","pos":"virt-launcher.go:277","timestamp":"2022-07-12T18:22:58.038800Z"}
2022-07-12T18:22:58.432530683Z stderr F {"component":"virt-launcher","level":"info","msg":"Reaped pid 97 with status 9","pos":"virt-launcher.go:549","timestamp":"2022-07-12T18:22:58.432332Z"}
2022-07-12T18:22:58.648569173Z stderr F {"component":"virt-launcher","level":"info","msg":"Reaped pid 103 with status 0","pos":"virt-launcher.go:549","timestamp":"2022-07-12T18:22:58.648253Z"}
2022-07-12T18:22:59.280890369Z stderr F {"component":"virt-launcher","level":"info","msg":"Reaped pid 105 with status 9","pos":"virt-launcher.go:549","timestamp":"2022-07-12T18:22:59.280686Z"}
2022-07-12T18:22:59.352756645Z stderr F {"component":"virt-launcher","level":"error","msg":"Unable to write to monitor: Broken pipe","pos":"qemuMonitorIOWrite:453","subcomponent":"libvirt","thread":"76","timestamp":"2022-07-12T18:22:59.352000Z"}
2022-07-12T18:22:59.352872234Z stderr F {"component":"virt-launcher","level":"warning","msg":"cannot parse process status data","pos":"qemuGetProcessInfo:1443","subcomponent":"libvirt","thread":"33","timestamp":"2022-07-12T18:22:59.352000Z"}
2022-07-12T18:22:59.352941143Z stderr F {"component":"virt-launcher","level":"warning","msg":"cannot parse process status data","pos":"qemuGetProcessInfo:1443","subcomponent":"libvirt","thread":"33","timestamp":"2022-07-12T18:22:59.352000Z"}
2022-07-12T18:22:59.356804095Z stderr F {"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 5 with reason 5 received","pos":"client.go:438","timestamp":"2022-07-12T18:22:59.356657Z"}
2022-07-12T18:22:59.362118628Z stderr F {"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Shutoff(5):Crashed(3)","pos":"client.go:288","timestamp":"2022-07-12T18:22:59.361983Z"}
I am now trying to intercept qemu-system's stdout/stderr streams to see what's going on. I have seen crashes while booting the iso installer, crashes in a running system, etc. Any non-windows VM is totally stable on this cluster. All windows have been tested against the suse vmdp 2.5.3 drivers. I am going to also try the redhat distributed virtio drivers
Still can't get to the root cause. QEMU crashes without a peep. Have seen it crash simply booting the installer from iso before any virtio drivers are loaded. Tested Server 2012, 2016, 2019, 2022 with and without various versions of the virtio drivers shipped by SuSE, Redhat, (and Fedora). Machine type is q35. CPU is Intel Atom C3758 / Denverton however kubevirt picks it up as Snowridge and qemu executes with the definition
Snowridge,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,mpx=on,md-clear=on,stibp=on,xsaves=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,clwb=off,gfni=off,cldemote=off,movdiri=off,movdir64b=off,core-capability=off,split-lock-detect=off
I thought maybe I'd try specifying
model: Westmere
which does seem to help get through the installer, but installing virtio drivers still results in an unstable vm. Linux and FreeBSD VMs on this cluster are having no issues at all.
g
any chance you may have a support bundle handy? I’d like to see if i can spot anything
else i will try and replicate something similar in my local setup
are you also please able to confirm if the instructions listed here were followed: https://docs.harvesterhci.io/v1.0/vm/create-windows-vm/
l
Sure will try to get one over in a bit. Honestly not really sure where to focus the investigation, the crashes aren't really coincident with anything I've been changing, and it's very confusing how best to provoke additional logging. Thanks a bunch for the offer to look.
Yes no problems configuring the vm per the instructions and shipping template. Sometimes the VM will run for a long time. We have only deviated from the proscribed config during troubleshooting or to make local config changes. (For instance we need to attach an additional volume.) The vms tends to crash much more frequently when we try to use virtio devices for disk and network, but this also feels like it might be a red herring. As I said we have a linux and FreeBSD vm running along happily using virtio for everything, same machine config basically. Currently have a Win2k19 vm up using scsi for storage and virtio nic, but it also crashes. The only drivers loaded are the vioscsi and NetKVM shipped in virtio-win-0.1.185 compiled by Fedora. We do plan to test on client versions of windows as well. I'll report more as I am able.
What is the best way to send a bundle along?
Full qemu command (split on argument for legibility)
Copy code
/usr/bin/qemu-system-x86_64
-name guest=default_houston,debug-threads=on
-S
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_houston/master-key.aes
-machine pc-q35-5.2,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram
-cpu Snowridge,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,mpx=on,md-clear=on,stibp=on,xsaves=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,clwb=off,gfni=off,cldemote=off,movdiri=off,movdir64b=off,core-capability=off,split-lock-detect=off
-m 16284
-object memory-backend-ram,id=pc.ram,size=17075011584
-overcommit mem-lock=off
-smp 2,sockets=1,dies=1,cores=2,threads=1
-object iothread,id=iothread1
-uuid cf05ef79-c1d0-54d1-aa7b-5779a0b43f19
-smbios type=1,manufacturer=KubeVirt,product=None,uuid=cf05ef79-c1d0-54d1-aa7b-5779a0b43f19,family=KubeVirt
-no-user-config
-nodefaults
-chardev socket,id=charmonitor,fd=24,server=on,wait=off
-mon chardev=charmonitor,id=monitor,mode=control
-rtc base=utc
-no-shutdown
-boot strict=on
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6
-device qemu-xhci,id=usb,bus=pci.2,addr=0x0
-device virtio-scsi-pci-non-transitional,id=scsi0,bus=pci.3,addr=0x0
-device virtio-serial-pci-non-transitional,id=virtio-serial0,bus=pci.4,addr=0x0
-blockdev {"driver":"host_device","filename":"/dev/cdrom-disk","aio":"native","node-name":"libvirt-4-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev {"node-name":"libvirt-4-format","read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"}
-device ide-cd,bus=ide.0,drive=libvirt-4-format,id=ua-cdrom-disk,bootindex=1,write-cache=on,werror=stop,rerror=stop
-blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_2.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev {"node-name":"libvirt-3-format","read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"}
-blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/virtio-container-disk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev {"node-name":"libvirt-2-format","read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}
-device ide-cd,bus=ide.1,drive=libvirt-2-format,id=ua-virtio-container-disk,bootindex=3,write-cache=on,werror=stop,rerror=stop
-blockdev {"driver":"host_device","filename":"/dev/rootdisk","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
-device virtio-blk-pci-non-transitional,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=ua-rootdisk,bootindex=2,write-cache=on,werror=stop,rerror=stop
-netdev tap,fd=27,id=hostua-default,vhost=on,vhostfd=28
-device virtio-net-pci-non-transitional,host_mtu=1500,netdev=hostua-default,id=ua-default,mac=ea:78:b9:47:8c:a3,bus=pci.1,addr=0x0,romfile=
-chardev socket,id=charserial0,fd=29,server=on,wait=off
-device isa-serial,chardev=charserial0,id=serial0
-chardev socket,id=charchannel0,fd=30,server=on,wait=off
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-device usb-tablet,id=ua-tablet,bus=usb.0,port=1
-vnc vnc=unix:/var/run/kubevirt-private/09174b98-57fe-4397-b792-1542ebad19e2/virt-vnc
-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1
-device virtio-balloon-pci-non-transitional,id=balloon0,bus=pci.6,addr=0x0
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on
g
you can dm me the bundle
Hi John, are you by any chance able to generate a support bundle when this happens?
l
Hello gauravm very sorry I had neglected to follow up on this. we had to put this project on hold for a few weeks. However I got back to it yesterday and caught up to some of the notes on the gitlab project. They were helpful in tracing this to the same OOM kill described in https://github.com/harvester/harvester/issues/2419 and the workaround of setting Reserved Memory >= 256MiB resolves the issue. We have tested with Windows Server 2016, 2019, and 2022 successfully without crashing. We are currently running with Reserved Memory set to 512MiB for the windows VM
It's not clear to me why the qemu process is allocating additional memory when running a windows vm instead of a linux vm, however while we were experiencing this issue, it seemed that using emulated devices instead of virtio devices made the problem occur less frequently. One area of investigation is that perhaps the portion of the virtio driver running in host userspace needs to allocate larger buffers for windows guests than it does for linux guests? In any case it's clear than 100MiB is not enough overhead for kubevirt in all cases.
g
Thanks John, I saw the update on the issue. Right now i am not entirely sure either but will check it out