Just a thread for this <https github com harvester harvester Rancher Users #harvester

Join Slack

Just a :thread: for this <GitHub Issue 8856>:

# harvester

bland-article-62755

08/21/2025, 5:51 PM

Just a 🧵 for this GitHub Issue 8856:

bland-article-62755

08/21/2025, 5:52 PM

supportbundle_85c30c31-20b0-4569-bafd-287b46655f4b_2025-08-21T17-36-49Z.zip

bland-article-62755

08/21/2025, 5:57 PM

At one point @millions-microphone-3535 you asked if the nodes were under pressure, they def aren't. 9 nodes each with 256 cores (arm) and a Ti of Ram. There's only 3 VMs in the entire cluster (it's new). I think I have another cluster with older Intel Procs I might be able to reproduce this on as well. It's just more difficult as it doesn't trigger every time (only 1 in 6 this last time) but let me know if that's something your team wants.

millions-microphone-3535

08/21/2025, 6:07 PM

yeah, i was curious about the "killing the pod manually triggered the drain for the other two nodes" scenario

millions-microphone-3535

08/21/2025, 6:07 PM

the latest pdb errors are definitely not related to resource pressure

bland-article-62755

08/21/2025, 7:07 PM

Best I could figure is that sometimes there's duplicate engines running so I maybe it was blocking the engines somehow because the stuck pdb also had the mount ? idk.

bland-article-62755

08/21/2025, 7:15 PM

It also could have been the UI just being super laggy

millions-microphone-3535

08/21/2025, 7:39 PM

let me take a look at the support bundle

millions-microphone-3535

08/21/2025, 7:39 PM

i don't have proof yet, but i suspect somehow when backup failed, it left behind stale, limbo resources

bland-article-62755

09/03/2025, 10:07 PM

supportbundle_85c30c31-20b0-4569-bafd-287b46655f4b_2025-09-03T21-57-47Z.zip

Open in Slack

Previous Next