hello I am struggling with lima VM disk corruption on MacOS Rancher Users #rancher-desktop

hello .. ! I am struggling with lima VM disk corru...

microscopic-sandwich-7442

12/20/2023, 2:47 PM

hello .. ! I am struggling with lima VM disk corruption on MacOS (now 14.2.1 (23C71)), M1MAX, rancher-desktop 1.11.1, VZ, Rosetta enabled, socket-vmnet enabled, virtiofs enabled, longhorn 1.5.~1~3 Workload wise I am attempting to run a local jenkins CI on RD using jenkins k8s plugin for the jenkins agents that use podman (user mode) to create multi arch containers. that sounds a little daring, I admit. but works in principle. however some containers I build have very resource (CPU/RAM/disk) intensive build steps that probably function as smoke test for this setup. I usually just have to run one pipeline to introduce disk corruption. One way I have used to identify this is to run dive

Copy code

$ docker image list -q | xargs -ti dive --ci {} 1> /dev/null
dive --ci 6555fa0c7f55
cannot fetch image
Error response from daemon: file integrity checksum failed for "usr/local/bin/goreleaser"
dive --ci ebbd5cee7012
cannot fetch image
Error response from daemon: file integrity checksum failed for "usr/lib64/az/lib/python3.6/site-packages/azure/mgmt/compute/v2019_12_01/operations/_gallery_applications_operations.py"
dive --ci 97c27a0c1c42
cannot fetch image
Error response from daemon: file integrity checksum failed for "usr/local/bin/golangci-lint-1.55.2_linux_amd64"
dive --ci 82e4d2804e90
dive --ci b6a37db9b2d0

(dive pulls all layers of an image to allow it to introspect it, but I use it to detect corruption here) the errors are not only in the container layers, it is on the lima

Copy code

~/Library/Application\ Support/rancher-desktop/lima/0/diffdisk

as well. To test that I used an alpine vm on UTM to be allowd to surface that diffdisk as disk in alpine 19.x (need that, as the new lima is requires the latest e2fsprogs-1.47) to run e2fsck that test found plenty corruptions on the filesystem level of the disk partition. I sadly do not have the log of that at hand, I will need to repeat that. (obviously I attached that disk while RD has not been running) As initially I did not know when (and how) the problem is being introduced (as I had updated the lima vm over the rancher-desktop releases) I decided to start fresh on rancher-desktop 1.11.1 but ended up corrupting the scenario in no time. I understand that Apple SSDs (like all) age over time so it is possible (but I do not believe likely) that I have a HW problem. Does anyone have similar experiences? Any tips how I can avoid the FS corruption?

6 Views

Open in Slack

Previous Next