This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

12/25/2023, 11:57 AM

This message was deleted.

clever-dinner-28781

12/25/2023, 11:58 AM

Longhorn v2.8.0 installed via Rancher UI.

clever-dinner-28781

12/25/2023, 12:25 PM

kubectl -n longhorn-system logs -f -l app=longhorn-manager

clever-dinner-28781

12/25/2023, 12:26 PM

time="2023-12-25T12:23:15Z" level=error msg="Failed to sync Longhorn volume" func=controller.handleReconcileErrorLogging file="utils.go:72" Volume=longhorn-system/pvc-ac713d43-a4ec-4e9c-b257-75ca0788b875 controller=longhorn-volume error="failed to sync longhorn-system/pvc-ac713d43-a4ec-4e9c-b257-75ca0788b875: unable to get engine image rancher/mirrored-longhornio-longhorn-engine:v1.5.3: <http://engineimage.longhorn.io|engineimage.longhorn.io> \"ei-8c052ab7\" not found" node=k8s-rancher-worker2

time="2023-12-25T12:23:15Z" level=warning msg="Rejected operation: Request (user: system:serviceaccount:longhorn-system:longhorn-service-account, <http://longhorn.io/v1beta2|longhorn.io/v1beta2>, Kind=Setting, namespace: longhorn-system, name: taint-toleration, operation: CREATE)" func="admission.(*Handler).admit" file="admission.go:106" error="failed to set the setting taint-toleration with invalid value : current state prevents this: cannot modify toleration setting before all volumes are detached" service=admissionWebhook

time="2023-12-25T12:23:15Z" level=debug msg="admit result: CREATE <http://longhorn.io/v1beta2|longhorn.io/v1beta2>, Kind=Setting longhorn-system/taint-toleration user=system:serviceaccount:longhorn-system:longhorn-service-account allowed=false err=<nil>" func="webhook.(*Router).admit" file="router.go:89"

time="2023-12-25T12:23:15Z" level=warning msg="Rejected operation: Request (user: system:serviceaccount:longhorn-system:longhorn-service-account, <http://longhorn.io/v1beta2|longhorn.io/v1beta2>, Kind=Setting, namespace: longhorn-system, name: taint-toleration, operation: CREATE)" func="admission.(*Handler).admit" file="admission.go:106" error="failed to set the setting taint-toleration with invalid value : current state prevents this: cannot modify toleration setting before all volumes are detached" service=admissionWebhook

time="2023-12-25T12:23:15Z" level=debug msg="admit result: CREATE <http://longhorn.io/v1beta2|longhorn.io/v1beta2>, Kind=Setting longhorn-system/taint-toleration user=system:serviceaccount:longhorn-system:longhorn-service-account allowed=false err=<nil>" func="webhook.(*Router).admit" file="router.go:89"

time="2023-12-25T12:23:15Z" level=warning msg="Failed to collect number of node disks" func="controller.(*ClusterInfo).collectNodeScope" file="setting_controller.go:1667" controller=longhorn-setting error="<http://node.longhorn.io|node.longhorn.io> \"k8s-rancher-worker2\" not found" node=k8s-rancher-worker2

Error from server: Get "<https://198.10.5.215:10250/containerLogs/longhorn-system/longhorn-manager-659t4/longhorn-manager?follow=true&tailLines=10>": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 198.10.5.215

clever-dinner-28781

12/25/2023, 12:59 PM

i believe i found the problem. Olders volumes in delete stucking... I can't delete them.

Copy code

> k -n longhorn-system get volume
NAME                                       STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE                  AGE
pvc-00f57e78-60d8-4e5d-ad54-c1486efd034c   deleting   healthy                  10737418240   k8s-rancher-worker4   143d
pvc-029facf4-883c-460e-94fa-047c197fd9f9   deleting   healthy                  10737418240   k8s-rancher-worker4   143d
pvc-5a55f3c6-f22d-4a7b-9eea-5c3a359d8996   deleting   healthy                  10737418240   k8s-rancher-worker3   124d
pvc-95bcef12-9255-4d3a-a38e-3b2f7f766480   deleting   unknown                  10737418240   k8s-rancher-worker2   124d
pvc-ac713d43-a4ec-4e9c-b257-75ca0788b875   deleting   unknown                  10737418240   k8s-rancher-worker2   124d

clever-dinner-28781

12/26/2023, 2:38 PM

I removed objects in etcd and reinstalled LongHorn. That's ok.

late-needle-80860

12/26/2023, 8:24 PM

So works now?

clever-dinner-28781

12/26/2023, 8:53 PM

@late-needle-80860 yes! It's a test environment, so no problem losing the data 👍

clever-dinner-28781

12/26/2023, 8:55 PM

i need to understand these disaster-recovery scenarios better. For example, I have the /var/lib/longhorn (all nodes), how do I recover the volumes? Recreate PV/PVCs I believe only restoring etcd

late-needle-80860

12/26/2023, 9:02 PM

You use e.g. Velero or kasten … to backup persistent storage.

✅ 1

famous-journalist-11332

12/27/2023, 2:21 AM

> i need to understand these disaster-recovery scenarios better. For example, I have the /var/lib/longhorn (all nodes), how do I recover the volumes? Recreate PV/PVCs I believe only restoring etcd There are few approaches depending on specific situation: 1. Backup/Restore volume using Longhorn native feature https://longhorn.io/docs/1.5.3/snapshots-and-backups/backup-and-restore/. This will only handle PV/PVC/Longhorn volume data 2. Velero/Kasten as @late-needle-80860 mentioned above. This will also handle your workload (deployment/daemonset/statefulset/pod) , secrete, configmap, services,... manifest. We have an instruction for Velero case https://longhorn.io/docs/1.5.3/advanced-resources/system-backup-restore/restore-to-a-new-cluster-using-velero/#assumptions 3. If you don't have any backup but have some data on disk at /var/lib/longhorn, try https://longhorn.io/docs/1.5.3/advanced-resources/data-recovery/export-from-replica/

🎯 1

✅ 1

8 Views

Open in Slack

Previous Next