Just a :thread: for this <GitHub Issue 8856>:
# harvester
b
Just a 🧵 for this GitHub Issue 8856:
supportbundle_85c30c31-20b0-4569-bafd-287b46655f4b_2025-08-21T17-36-49Z.zip
At one point @millions-microphone-3535 you asked if the nodes were under pressure, they def aren't. 9 nodes each with 256 cores (arm) and a Ti of Ram. There's only 3 VMs in the entire cluster (it's new). I think I have another cluster with older Intel Procs I might be able to reproduce this on as well. It's just more difficult as it doesn't trigger every time (only 1 in 6 this last time) but let me know if that's something your team wants.
m
yeah, i was curious about the "killing the pod manually triggered the drain for the other two nodes" scenario
the latest pdb errors are definitely not related to resource pressure
b
Best I could figure is that sometimes there's duplicate engines running so I maybe it was blocking the engines somehow because the stuck pdb also had the mount ? idk.
It also could have been the UI just being super laggy
m
let me take a look at the support bundle
i don't have proof yet, but i suspect somehow when backup failed, it left behind stale, limbo resources
b
supportbundle_85c30c31-20b0-4569-bafd-287b46655f4b_2025-09-03T21-57-47Z.zip