This message was deleted.
# longhorn-storage
a
This message was deleted.
n
Hi! What is the output of the the following command? Do you see anything strange?
kubectl -n longhorn-system get lhr
p
Yes i see al lot of pvc with stopped state
this was around the time of the network outage, i think it is save to remove them right, do you have any tips on how to do this in a safe manner?
n
Those are the replicas. Can you show me the replica associated with the problematic node? Like this: kubectl -n longhorn-system get lhr ID -o yaml
p
apiVersion: longhorn.io/v1beta2 kind: Replica metadata: creationTimestamp: "2024-03-12T102855Z" generation: 1 labels: longhorn.io/backing-image: "" longhorndiskuuid: "" longhornnode: "" name: pvc-fe8c0606-efbb-4a32-9aa8-67b7a797b4bc-r-f8b5dea8 namespace: longhorn-system ownerReferences: - apiVersion: longhorn.io/v1beta2 kind: Volume name: pvc-fe8c0606-efbb-4a32-9aa8-67b7a797b4bc uid: 11c6a626-ad81-40b0-b4bd-85a6d1ce72a3 resourceVersion: "485485953" uid: 002761f5-d850-46ab-8817-9c823b0abfa0 spec: active: true backingImage: "" baseImage: "" dataDirectoryName: "" dataPath: "" desireState: stopped diskID: "" diskPath: "" engineImage: longhornio/longhorn-engine:v1.4.0 engineName: pvc-fe8c0606-efbb-4a32-9aa8-67b7a797b4bc-e-56db2b76 failedAt: "" hardNodeAffinity: "" healthyAt: "" logRequested: false nodeID: "" rebuildRetryCount: 5 revisionCounterDisabled: false salvageRequested: false unmapMarkDiskChainRemovedEnabled: false volumeName: pvc-fe8c0606-efbb-4a32-9aa8-67b7a797b4bc volumeSize: "10737418240" status: conditions: - lastProbeTime: "" lastTransitionTime: "2024-03-12T102855Z" message: "" reason: "" status: "True" type: InstanceCreation currentImage: "" currentState: stopped evictionRequested: false instanceManagerName: "" ip: "" logFetched: false ownerID: rr8-aw07 port: 0 salvageExecuted: false started: false storageIP: ""
n
Hm. And your PV has other healthy replicas? Or stucked in rebuilding phase?
p
yes
sorry healty replica's
n
If it has other replicas and you have a backup My opinion is that you can delete the replica above. For example with kubectl delete lhr command
p
thanks, great help
Do you have a tip to delete 8999 stopped replica's
What field selector should i use if a needed to get all stopped replica's
n
wow 🙂
all of them belong to one pv ?
p
yes i know
yes
real shit now because looks like longhorn tries tor start them randomly
already one node killed OOM longhorn because of memory exshaution
n
maybe you can do something like this:
Copy code
kubectl  -n longhorn-system  get lhr -l longhornvolume=pvc-d97403bd-0d61-40a4-8a97-b2e9db8439b2
kubectl  -n longhorn-system  delete lhr -l longhornvolume=pvc-d97403bd-0d61-40a4-8a97-b2e9db8439b2
you can see the labels with the kubectl get lhr --show-labels command
p
i see these
Error from server (BadRequest): Unable to find "longhorn.io/v1beta2, Resource=replicas" that match label selector "", field selector "longhornnode=rr8-aw05": field label not supported: longhornnode
n
what was your command ?
p
im going with this now -> kubectl -n longhorn-system get replica --selector longhornnode= | grep stopped | awk '{print $1}' | kubectl -n longhorn-system delete replica
i get this "error: resource(s) were provided, but no name was specified"
n
try like this :
Copy code
for i in $(kubectl  -n longhorn-system  get lhr | grep pvc-f19aed02-12b9-451f-8111-f54b5bc23019 | awk '{print $1}' ); do kubectl -n longhorn-system get lhr $i;  done
p
for i in $(kubectl -n longhorn-system get lhr --selector=longhornnode= | grep stopped | awk '{print $1}' ); do kubectl -n longhorn-system delete lhr $i; done
this works, it running now
What happens when you kill longhorn manager, its eating a lot of memory
f
Deleting longhorn-manager is a safe operation. It should not cause anything interesting to happen on the data plane.
p
Check i figured that out, thanks for the confirmation
f
Do you have a support bundle from when you had thousands of replicas you are willing to share @polite-alarm-96476? The Longhorn team would be interested in trying to discover what particular race condition led to the creation of all of those unnecessary replicas.
p
Hi Eric, im afraid i have not, but the next time ( i hope not) i have some funky stuff i will create one.