i'm in a state where i can mount volume-head-000.i...
# longhorn-storage
b
i'm in a state where i can mount volume-head-000.img just fine, but the UI says State: Detached Health:Faulted Ready for workload:Not Ready how do i reset this volume?
c
b
so do i want revision counter or not? in this case (i have a few faults), there is only one img file
c
What do you mean a few faults and only one image file? Do you mean you have some failed volumes and you can only see only one volume-head-xxx.img in directory
/var/lib/longhorn/replicas/
on the worker node?
b
after a 2h net outage between k8s nodes, i have 3 new faulted vols out of 20
considering the ollama fault, i indeed have only 1 volume-head-*.img and it is mountable (uotside of LH)
/d4/longhorn/replicas/pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61-20194b66# head *meta ==> volume-head-000.img.meta <== {"Name":"volume-head-000.img","Parent":"","Removed":false,"UserCreated":false,"Created":"2025-02-20T012220Z","Labels":null} ==> volume.meta <== {"Size":53687091200,"Head":"volume-head-000.img","Dirty":true,"Rebuilding":false,"Error":"","Parent":"","SectorSize":512,"BackingFilePath":""}
i e2fsck'd the img myself, so that Dirty: true field is probably wrong
clicking salvage in the UI fails with this: "unable to salvage volume pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61: disk with UUID 8a3b890b-5111-4821-ba57-cbcc99669852 on node dash is unschedulable for replica pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61-r-76a68e05"
c
> /d4/longhorn/replicas/pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61-20194b6 It is the directory for the volume
pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61
, it should have other directories for volumes in
d4/longhorn/replicas/
. > unable to salvage volume pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61: disk with UUID 8a3b890b-5111-4821-ba57-cbcc99669852 on node dash is unschedulable for replica pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61-r-76a68e05 Do you have enough storage to schedule the replica? Could you provide the support bundle for investigation?
b
correct dir; diskfree looks ok
c
Is node
dash
available or ready?
b
yes- GH issue comig up
support bundle has been going for many minutes: 33%
๐Ÿ‘ 1
c
Could you provide the information with the command
Copy code
kubectl -n longhorn-system get <http://nodes.longhorn.io|nodes.longhorn.io> dash -oyaml
And
Copy code
kubectl -n longhorn-system get lhr pvc-ec1d7d67-ab8f-416d-b1b2-f876add59c61-r-76a68e05 -oyaml
b
dashnode.yaml
ollama-pvc.yaml
c
Copy code
d4:
      conditions:
      - lastProbeTime: ""
        lastTransitionTime: "2024-04-22T07:25:14Z"
        message: Disk d4(/d4/longhorn) on node dash is ready
        reason: ""
        status: "True"
        type: Ready
      - lastProbeTime: ""
        lastTransitionTime: "2025-04-06T05:02:41Z"
        message: Disk d4 (/d4/longhorn) on the node dash has 247463936000 available,
          but requires reserved 0, minimal 25% to schedule more replicas
        reason: DiskPressure
        status: "False"
        type: Schedulable
      diskDriver: ""
      diskName: ""
      diskPath: /d4/longhorn
      diskType: filesystem
      diskUUID: 8a3b890b-5111-4821-ba57-cbcc99669852
This disk
d4
on the node
dash
is not able to be scheduled so it got the error.
b
Filesystem Size Used Avail Use% Mounted on /dev/sdb2 938G 708G 183G 80% /d4
message: Disk d4 (/d4/longhorn) on the node dash has 247463936000 available, but requires reserved 0, minimal 25% to schedule more replicas ^ you're seeing that?
c
Yes.
status
needs to be โ€œTrueโ€
Copy code
status: "False"
type: Schedulable
b
80% used needs to be under 75% ?
and would you agree this is a UI bug? ๐Ÿ™‚
c
and would you agree this is a UI bug?
Do you mean UI should not allow users do the salvage if the disk is not able to be scheduled?
b
the status of the volume should include "my disk is unsched because it is 80% full; need 75% or below"
c
Yes, that could be an improvement.
b
i'll add it to the bug
thx for looking at this- the data was just LLM caches, but we got to make a good bugreport ๐Ÿ™‚
๐Ÿ‘ 1
"failed to create SupportBundle: admission webhook \"validator.longhorn.io\" denied the request: please try again later. Another support-bundle-2025-04-10t07-19-12z is in Generating phase" <-- this is kind of a mess