When rke2 is loading images, does the order matter...
# rke2
b
When rke2 is loading images, does the order matter? Does rke2 short-circuit running the kubelet if the resources are loaded first? On systems with slow disk we are seeing long image load times, so we are thinking that if we change the order of what images are loaded that the clusters start up time will be reduced. However, if RKE2 just waits for all the images to load before starting the kubelet, the image load order won't matter.
Copy code
time="2025-03-05T11:12:42-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/rke2-runtime.tar.zst in 2.164178596s"
time="2025-03-05T11:21:43-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/sensor-images-base-amd64.tar.zst in 9m1.013485723s"
time="2025-03-05T11:21:43-08:00" level=info msg="Importing images from /data/rancher/rke2/agent/images/sensor-images-ml-amd64.tar.zst"
...
time="2025-03-05T11:25:46-08:00" level=fatal msg="Failed to get request handlers from apiserver: context deadline exceeded, failed to get apiserver /readyz status: Get \"<https://127.0.0.1:6443/readyz?timeout=15m0s>\": dial tcp 127.0.0.1:6443: connect: connection refused"
...
time="2025-03-05T11:27:15-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/rke2-runtime.tar.zst in 32.039261802s"
time="2025-03-05T11:33:01-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/sensor-images-base-amd64.tar.zst in 5m45.071635076s"
time="2025-03-05T11:35:25-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/sensor-images-ml-amd64.tar.zst in 2m24.502709118s"
time="2025-03-05T11:35:48-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/sensor-images-rke2-canal-amd64.tar.zst in 23.352564471s"
time="2025-03-05T11:36:58-08:00" level=info msg="Imported images from /data/rancher/rke2/agent/images/sensor-images-rke2-core-amd64.tar.zst in 1m9.243913448s"
time="2025-03-05T11:36:58-08:00" level=info msg="Running kubelet ..."
...
time="2025-03-05T11:37:32-08:00" level=info msg="Kube API server is now running"
c
it imports all the images every time, before starting the kubelet. the kubelet is required to run etcd and apiserver static pods.
b
yeah. that's unfortunate.
c
You’ve definitely got too much stuff in that sensor-images tarball. Airgap images are not intended to be used to load a LOT of stuff, really just the stuff that is required to bootstrap the cluster. You should set up a local registry mirror and put it in there so that the images can be pulled on demand.
Looks like youve got probably, 10s of gigs of data in those ML images? Definitely not the intended use of airgap images.
b
ok we will take this info an try a different approach
c
Are you rebuilding your own tarballs? There are runtime, core, and canal tarballs here that we don’t produce, and are also taking a suspiciously long time to import. Either you’ve got too much stuff in those images as well, or your disk is VERY slow.
the runtime and core images should take ~10 seconds to import. Not 30 seconds to a minute.
b
the disk is VERY slow.
this is part of our torture testing lab.
c
that certainly wouldn’t help
the image import is almost 100% io bound
b
we need to build our solution to work in the LCD, thus why we have the torture lab.
c
whatever you airgap import is pinned so that it can’t be removed by image GC. The next time rke2 starts, all the pins are cleared, and anything imported that startup is pinned fresh. Its intended just for stuff that you absolutely NEED for cluster nodes to work. Not stuff for your workload that can come up later.
b
we actually want that pinning.
but that can be addressed later, as you mentioned