https://rancher.com/ logo
Title
q

quaint-alarm-7893

02/10/2023, 11:18 PM
question. i have a store class i setup for HDD's since rebooting the node, it seems Longhorn now wasnt so schedule other volumes onto that disk. am i missing something? Disk has a tag of hdd, node has a tag of lff. the class has both of those sets. the volumes being scheduled onto the disk are part of the harvester-longhorn default class. not sure why they keep getting put onto the hdd, but they build to 99%, then sit there forever....
n

narrow-egg-98197

02/11/2023, 2:34 PM
Could you share your StorageClass yaml? Are the tags specified in
diskSelector
and
nodeSelector
fields? ref. https://longhorn.io/docs/1.4.0/volumes-and-nodes/storage-tags/
And I don't get that point that you mention build to 99%. Is that means the progress of restoring volume data?
q

quaint-alarm-7893

02/11/2023, 5:38 PM
here's an example. this volume, has no class set. it's rebuilding, and stuck at 99% (been stuck there overnight). it seems any volumes the server had to rebuild (i had to reboot the node) are provisioning on these drives. probably because they are 5TB disks with plenty of space. the disk it's being provisioned on, is assigned a disk a node, and disk selector. i removed the node label from the storage page to see if that helped, but it did not. it seems the only way to stop it from spinning on on these disks is to mark it unschedulable, and delete all but the volumes i want on it, which is obviously not going two work well 🙂 sc yaml:
allowVolumeExpansion: true
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
kind: StorageClass
metadata:
  creationTimestamp: '2023-02-02T13:42:23Z'
  name: hdd
  resourceVersion: '180629798'
  uid: f23599b7-49c6-407e-9cd8-184e0d03b27b
parameters:
  diskSelector: hdd
  migratable: 'true'
  nodeSelector: lff
  numberOfReplicas: '3'
  staleReplicaTimeout: '30'
provisioner: <http://driver.longhorn.io|driver.longhorn.io>
reclaimPolicy: Delete
volumeBindingMode: Immediate
pics of rebuilding volume, node, and disk attached.
👀 1
n

narrow-egg-98197

02/12/2023, 2:14 PM
It seems like there is something wrong on the node: harvester-03. Could you check there is any error message in instance-manager-r & longhorn-manager pods (on the affected node)? Or send support bundle to let us investigate.
In addition, could you also try to re-schedule the replica to the node: harvester-03 then check if it is works well or not? 1. Disable node: harvester-03 with
Edit Node and Disks
option in Node tab 2. Delete the problem replica in Volume tab 3. Re-enable node: harvester-03
q

quaint-alarm-7893

02/13/2023, 10:59 PM
tried all of it, still no luck. pvc-13787ebc-7d4e-41ad-9206-de9a6adb938a wont detatch and shows multiple instance managers. i also get a message:
failed to list snapshot: cannot get client for volume pvc-13787ebc-7d4e-41ad-9206-de9a6adb938a: more than one engine exists
i just dont know how i can detach the PVC without deleting the data. support bundle attached
@narrow-egg-98197 the msg above, is more for the other thread you were msging me on.
hello everyone, i have a vm (in harvesteR) that got messed up when one of my servers had an issue, and now the VM is off, but the PVC is still attached. how can i detach it so i can bring the vm back online?
State: Attached
Health:
No node redundancy: all the healthy replicas are running on the same node Degraded
Ready for workload:Not Ready
Conditions:
restore
scheduled
Frontend:Block Device
Attached Node & Endpoint:
Size:
100 Gi
Actual Size:Unknown
Data Locality:disabled
Access Mode:ReadWriteMany
Backing Image:vdi-image-hzwdh
Backing Image Size:50 Gi
Engine Image:longhornio/longhorn-engine:v1.3.2
Created:a month ago
Encrypted:False
Node Tags:
Disk Tags:
Last Backup:
Last Backup At:
Replicas Auto Balance:ignored
Instance Manager:
instance-manager-e-c2f010c0
instance-manager-e-50d28d2d
Last time used by Pod:10 hours ago
Namespace:vdi
PVC Name:vmname-rootdisk-ossjw
PV Name:pvc-13787ebc-7d4e-41ad-9206-de9a6adb938a
PV Status:Bound
Revision Counter Disabled:False
Last Pod Name:virt-launcher-vmname-mpbn2
Last Workload Name:vmname
Last Workload Type:VirtualMachineInstance
note, it show's its attached to tow instance managers. i've tried rebooting the nodes, and deleting the instance manager pods, no luck...
sorry for the confusion. i have two issues, the disk-tag issue, and this volume not unmounting. i can do the disk-tag issue next, if we can work this unmount issue out.