This message was deleted.
# longhorn-storage
a
This message was deleted.
i
What’s you longhorn version?
b
1.3
Now I'm having fun removing it. It obviously doesn't think it's removed because when I go to the app catalog it say to update/edit it. Not sure what is checks for that. CRUD's, just the name space. Not sure.
helm history check?
is SUSE comfortable with the 1.3.1 release? I should check the release notes.
grrrr. I hate helm 😉 Error: warning: Hook pre-delete longhorn/templates/uninstall-job.yaml failed: jobs.batch "longhorn-uninstall" already exists
rancher + helm annoys me. LOL. Just give me a bunch of yaml!
every time I go to cluster tools it says longhorn is installed. Should I just edit it and go through the steps to set taints and tolerations again via the longhorn.io documentation? Since this is a brand new cluster I'm at that frustrated stage and just thinking, screw this, I'll use rook instead but I'd like to keep everything under the SUSE umbrella so if we're smart and get a support contract it's all under one roof, so to speak.
i
Hello, can you check if the pods under longhorn-system are removed? after successful installation, you can install v1.3.1. I think you don't need to taint any node. Longhorn will be deployed on each worker node.
b
I'm on GKE and would like to dedicate nodes to longhorn so require nodes that can install the CSI driver. So I very much need to taint nodes.
and if I was going to do this on a personal cluster I would do the same thing.
i
I see. Shue. you can taint some nodes according to your design.
b
only thing 'running' in that namespace is the longhorn-uninstall batch job
i
Can you check the log?
b
which log? helm, that job, cluster...
i
ah, sorry. How did you uninstall LH?
b
probably a combination of multiple things. As this is not production I was not taking detailed notes. And my mistake, the longhorn-uninstall is in the default namespace.
i
I see. But I'd like to know how did you install and uninstall LH? Using kubelet or app catalog?
b
I think the rancher UI is just not in synch with a what I might have done via kubectl. It shows the longhorn-uninstall run was successful but it's still scheduled for deletion and a kubectl for all jobs in the default namespace does not show anything.
I installed longhorn via the app catalog and modified the taint and tolerations there initially.
i
Let me summarize it. You installed the LH using app catalog and uninstalled it using kubectl. Correct?
b
well, the tainttolerations for the driver, UI, etc. components I mean. Then I applied the taints on the nodes 'manually'.
I tried multiple things and not sure the order. I think a post hook failed and I got confused so I tried to take a hammer at it.
i
I see. Typically, if you install by app catalog, you need to uninstall it by the same tool. But it seems you use the different tools...Let me think how to deal with the case. I'll give you update if I find a solution.
b
currently the only thing in the longhorn-system namespace is a batch job in error and catalog app in transitioning
the batch job is actually in the default namespace but is listed in the rancher UI as being in the longhorn-system. The CRDs are removed as I checked via Lens IDE
i
Got it. Looks you uninstall by kubectl. Can you finish the all step in https://longhorn.io/docs/1.3.0/deploy/uninstall/#uninstalling-longhorn-using-kubectl for removing v1.3.0? Then, delete the LH from rancher app catalog.
b
sure... I'll give it a shot. Thanks for the poking and prodding help. Appreciate it!
i
Don't mention it. 🙂
b
running through all the kubectl steps again.
👍 1
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ job.batch/longhorn-uninstall created Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": podsecuritypolicies.policy "longhorn-uninstall-psp" already exists Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": serviceaccounts "longhorn-uninstall-service-account" already exists Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": clusterroles.rbac.authorization.k8s.io "longhorn-uninstall-role" already exists Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": clusterrolebindings.rbac.authorization.k8s.io "longhorn-uninstall-bind" already exists
kubectl get job/longhorn-uninstall -n default -w
NAME COMPLETIONS DURATION AGE longhorn-uninstall 0/1 29s 30s
i
Can you show
kubectl get pods
?
b
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ job.batch/longhorn-uninstall created Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": podsecuritypolicies.policy "longhorn-uninstall-psp" already exists Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": serviceaccounts "longhorn-uninstall-service-account" already exists Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": clusterroles.rbac.authorization.k8s.io "longhorn-uninstall-role" already exists Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml": clusterrolebindings.rbac.authorization.k8s.io "longhorn-uninstall-bind" already exists
kubectl get job/longhorn-uninstall -n default -w
NAME COMPLETIONS DURATION AGE longhorn-uninstall 0/1 29s 30s
wrong paste.. hold on.
kubectl get pods NAME READY STATUS RESTARTS AGE longhorn-uninstall-sc8b7 1/1 Running 0 4m57s nginx-7c658794b9-jrgm6 1/1 Running 0 7d3h nginx-7c658794b9-kmr2t 1/1 Running 0 7d3h nginx-7c658794b9-ntmck 1/1 Running 0 7d3h
i
Then, show
kubectl logs longhorn-uninstall-sc8b7
b
lots of 'failed to find' which is to be expected. Anything I'm looking for specifically?
i
Can you some whole error messages?
b
the end tail of the log:
W0824 153539.514679 1 reflector.go:324] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: failed to list *v1beta2.Engine: the server could not find the requested resource (get engines.longhorn.io) E0824 153539.514725 1 reflector.go:138] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: Failed to watch *v1beta2.Engine: failed to list *v1beta2.Engine: the server could not find the requested resource (get engines.longhorn.io) W0824 153540.976955 1 reflector.go:324] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: failed to list *v1beta2.Setting: the server could not find the requested resource (get settings.longhorn.io) E0824 153540.976993 1 reflector.go:138] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: Failed to watch *v1beta2.Setting: failed to list *v1beta2.Setting: the server could not find the requested resource (get settings.longhorn.io) W0824 153541.221896 1 reflector.go:324] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: failed to list *v1beta2.Orphan: the server could not find the requested resource (get orphans.longhorn.io) E0824 153541.221933 1 reflector.go:138] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: Failed to watch *v1beta2.Orphan: failed to list *v1beta2.Orphan: the server could not find the requested resource (get orphans.longhorn.io) time="2022-08-24T153541Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found" time="2022-08-24T153543Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found" time="2022-08-24T153545Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found" time="2022-08-24T153546Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found" time="2022-08-24T153546Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found" time="2022-08-24T153547Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found" W0824 153548.380484 1 reflector.go:324] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: failed to list *v1beta2.BackingImage: the server could not find the requested resource (get backingimages.longhorn.io) E0824 153548.380522 1 reflector.go:138] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: Failed to watch *v1beta2.BackingImage: failed to list *v1beta2.BackingImage: the server could not find the requested resource (get backingimages.longhorn.io) time="2022-08-24T153549Z" level=warning msg="worker error" controller=longhorn-uninstall error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found"
i
Then, can you show
Copy code
kubectl -n longhorn-system get pods
kubectl -n longhorn-system get crds
b
kubectl -n longhorn-system get pods NAME READY STATUS RESTARTS AGE longhorn-ui-768cf55d4d-m9nzw 1/1 Running 0 6d19h
kubectl -n longhorn-system get crds NAME CREATED AT alertmanagerconfigs.monitoring.coreos.com 2022-08-17T110502Z alertmanagers.monitoring.coreos.com 2022-08-17T110502Z apiservices.management.cattle.io 2022-08-17T105913Z apps.catalog.cattle.io 2022-08-17T105914Z authconfigs.management.cattle.io 2022-08-17T105915Z backendconfigs.cloud.google.com 2022-08-17T105737Z capacityrequests.internal.autoscaling.gke.io 2022-08-17T105703Z certificaterequests.cert-manager.io 2022-08-17T123643Z certificates.cert-manager.io 2022-08-17T123644Z challenges.acme.cert-manager.io 2022-08-17T123647Z clusterissuers.cert-manager.io 2022-08-17T123652Z clusterregistrationtokens.management.cattle.io 2022-08-17T105913Z clusterrepos.catalog.cattle.io 2022-08-17T105914Z clusters.management.cattle.io 2022-08-17T105913Z clusterscanbenchmarks.cis.cattle.io 2022-08-17T110715Z clusterscanprofiles.cis.cattle.io 2022-08-17T110715Z clusterscanreports.cis.cattle.io 2022-08-17T110715Z clusterscans.cis.cattle.io 2022-08-17T110715Z features.management.cattle.io 2022-08-17T105911Z frontendconfigs.networking.gke.io 2022-08-17T105737Z groupmembers.management.cattle.io 2022-08-17T105915Z groups.management.cattle.io 2022-08-17T105915Z ingressroutes.traefik.containo.us 2022-08-17T113255Z ingressroutetcps.traefik.containo.us 2022-08-17T113255Z ingressrouteudps.traefik.containo.us 2022-08-17T113256Z issuers.cert-manager.io 2022-08-17T123655Z managedcertificates.networking.gke.io 2022-08-17T105718Z memberships.hub.gke.io 2022-08-17T110043Z middlewares.traefik.containo.us 2022-08-17T113256Z middlewaretcps.traefik.containo.us 2022-08-17T113256Z navlinks.ui.cattle.io 2022-08-17T105913Z operations.catalog.cattle.io 2022-08-17T105914Z orders.acme.cert-manager.io 2022-08-17T123656Z podmonitors.monitoring.coreos.com 2022-08-17T110503Z preferences.management.cattle.io 2022-08-17T105913Z probes.monitoring.coreos.com 2022-08-17T110503Z prometheuses.monitoring.coreos.com 2022-08-17T110503Z prometheusrules.monitoring.coreos.com 2022-08-17T110504Z serverstransports.traefik.containo.us 2022-08-17T113257Z serviceattachments.networking.gke.io 2022-08-17T105739Z servicemonitors.monitoring.coreos.com 2022-08-17T110505Z servicenetworkendpointgroups.networking.gke.io 2022-08-17T105738Z settings.management.cattle.io 2022-08-17T105913Z storagestates.migration.k8s.io 2022-08-17T105732Z storageversionmigrations.migration.k8s.io 2022-08-17T105732Z thanosrulers.monitoring.coreos.com 2022-08-17T110505Z tlsoptions.traefik.containo.us 2022-08-17T113257Z tlsstores.traefik.containo.us 2022-08-17T113257Z tokens.management.cattle.io 2022-08-17T105915Z traefikservices.traefik.containo.us 2022-08-17T113258Z updateinfos.nodemanagement.gke.io 2022-08-17T105735Z userattributes.management.cattle.io 2022-08-17T105915Z users.management.cattle.io 2022-08-17T105915Z volumesnapshotclasses.snapshot.storage.k8s.io 2022-08-17T105731Z volumesnapshotcontents.snapshot.storage.k8s.io 2022-08-17T105731Z volumesnapshots.snapshot.storage.k8s.io 2022-08-17T105732Z
i
Looks the uninstallation is completed.
Run the two commands
Copy code
kubectl delete -f <https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/deploy/longhorn.yaml>
kubectl delete -f <https://raw.githubusercontent.com/longhorn/longhorn/v1.3.0/uninstall/uninstall.yaml>
b
cool, I'll try this but from the catalog and go through the steps to change the tolerations on the longhorn.io docs site. Thanks Derek. Sometimes just having someone there really helps.
last two yaml deletions changed the catalog from edit to install 😉
🙌 1
i
Cool😀 Just try to install v1.3.1.
b
that and the stuck batch job.. have I mentioned I hate helm? 😉
1.3.1... didn't I see a webinar or something for even a newer release soon? Not saying I want to be on bleeding edge. Just I recall something related.
i
Yes. You can check v1.3.0's webinar or introduction. In v1.3.1, we just fixed the bugs found in v1.3.0. So, it's much better.
b
🎉
🙌 1
how would I find the logs of a pod that failed? specifically the longhorn-driver-deployer.
kubectl get pods -n longhorn-system
NAME READY STATUS RESTARTS AGE longhorn-admission-webhook-5985df5688-4cqhj 1/1 Running 0 11m longhorn-admission-webhook-5985df5688-dh4d5 1/1 Running 0 11m longhorn-conversion-webhook-7b74768796-94kvx 1/1 Running 0 11m longhorn-conversion-webhook-7b74768796-gfzv5 1/1 Running 0 11m longhorn-driver-deployer-5655d57559-5l78r 0/1 Init:0/1 0 11m longhorn-manager-kt29w 0/1 CrashLoopBackOff 6 (4m56s ago) 11m longhorn-manager-n6t7n 0/1 Error 7 (5m11s ago) 11m longhorn-ui-675b96b9f7-7wpmr 1/1 Running 0 11m
i
kubectl -n longhorn-system logs <pod name>
Can you show the longhorn-manager logs? I will check them tomorrow
b
kubectl get pods -n longhorn-system NAME READY STATUS RESTARTS AGE longhorn-admission-webhook-5985df5688-4cqhj 1/1 Running 0 59m longhorn-admission-webhook-5985df5688-dh4d5 1/1 Running 0 59m longhorn-conversion-webhook-7b74768796-94kvx 1/1 Running 0 59m longhorn-conversion-webhook-7b74768796-gfzv5 1/1 Running 0 59m longhorn-driver-deployer-5655d57559-5l78r 0/1 Init:0/1 0 59m longhorn-manager-kt29w 0/1 CrashLoopBackOff 16 (114s ago) 59m longhorn-manager-n6t7n 0/1 CrashLoopBackOff 16 (2m12s ago) 59m longhorn-ui-675b96b9f7-7wpmr 1/1 Running 0 59m
kubectl logs ^C
kubectl logs longhorn-driver-deployer-5655d57559-5l78r -n longhorn-system
Error from server (BadRequest): container "longhorn-driver-deployer" in pod "longhorn-driver-deployer-5655d57559-5l78r" is waiting to start: PodInitializing
kubectl logs longhorn-manager-kt29w -n longhorn-system
time="2022-08-24T173842Z" level=error msg="Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host" time="2022-08-24T173842Z" level=fatal msg="Error starting manager: environment check failed: Failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net iscsiadm --version], output , stderr, nsenter: failed to execute iscsiadm: No such file or directory\n, error exit status 127"
helm chart with changes for taint tolerations:
i
Ah
b
annotations: {} csi: attacherReplicaCount: null kubeletRootDir: null provisionerReplicaCount: null resizerReplicaCount: null snapshotterReplicaCount: null defaultSettings: allowNodeDrainWithLastHealthyReplica: null allowRecurringJobWhileVolumeDetached: null allowVolumeCreationWithDegradedAvailability: null autoCleanupSystemGeneratedSnapshot: null autoDeletePodWhenVolumeDetachedUnexpectedly: null autoSalvage: null backingImageCleanupWaitInterval: null backingImageRecoveryWaitInterval: null backupTarget: null backupTargetCredentialSecret: null backupstorePollInterval: null concurrentAutomaticEngineUpgradePerNodeLimit: null concurrentReplicaRebuildPerNodeLimit: null createDefaultDiskLabeledNodes: null defaultDataLocality: null defaultDataPath: null defaultLonghornStaticStorageClass: null defaultReplicaCount: null disableReplicaRebuild: null disableRevisionCounter: null disableSchedulingOnCordonedNode: null guaranteedEngineManagerCPU: null guaranteedReplicaManagerCPU: null kubernetesClusterAutoscalerEnabled: null mkfsExt4Parameters: null nodeDownPodDeletionPolicy: null orphanAutoDeletion: null priorityClass: null replicaAutoBalance: null replicaReplenishmentWaitInterval: null replicaSoftAntiAffinity: null replicaZoneSoftAntiAffinity: null storageMinimalAvailablePercentage: null storageNetwork: null storageOverProvisioningPercentage: null systemManagedComponentsNodeSelector: null systemManagedPodsImagePullPolicy: null taintToleration: key=value:NoSchedule upgradeChecker: null enablePSP: true global: cattle: systemDefaultRegistry: '' windowsCluster: defaultSetting: systemManagedComponentsNodeSelector: kubernetes.io/os:linux taintToleration: cattle.io/os=linux:NoSchedule enabled: false nodeSelector: kubernetes.io/os: linux tolerations: - effect: NoSchedule key: cattle.io/os operator: Equal value: linux systemProjectId: p-s67s8 image: csi: attacher: repository: rancher/mirrored-longhornio-csi-attacher tag: v3.4.0 nodeDriverRegistrar: repository: rancher/mirrored-longhornio-csi-node-driver-registrar tag: v2.5.0 provisioner: repository: rancher/mirrored-longhornio-csi-provisioner tag: v2.1.2 resizer: repository: rancher/mirrored-longhornio-csi-resizer tag: v1.2.0 snapshotter: repository: rancher/mirrored-longhornio-csi-snapshotter tag: v3.0.3 longhorn: backingImageManager: repository: rancher/mirrored-longhornio-backing-image-manager tag: v3_20220808 engine: repository: rancher/mirrored-longhornio-longhorn-engine tag: v1.3.1 instanceManager: repository: rancher/mirrored-longhornio-longhorn-instance-manager tag: v1_20220808 manager: repository: rancher/mirrored-longhornio-longhorn-manager tag: v1.3.1 shareManager: repository: rancher/mirrored-longhornio-longhorn-share-manager tag: v1_20220808 ui: repository: rancher/mirrored-longhornio-longhorn-ui tag: v1.3.1 pullPolicy: IfNotPresent defaultImage: true ingress: annotations: null enabled: false host: sslip.io ingressClassName: null path: / secrets: null secureBackends: false tls: false tlsSecret: longhorn.local-tls longhornDriver: nodeSelector: {} priorityClass: null tolerations: null longhornManager: log: format: plain nodeSelector: {} priorityClass: null serviceAnnotations: {} tolerations: null longhornUI: nodeSelector: {} priorityClass: null replicas: 1 tolerations: null namespaceOverride: '' persistence: backingImage: dataSourceParameters: null dataSourceType: null enable: false expectedChecksum: null name: null defaultClass: true defaultClassReplicaCount: 3 defaultDataLocality: disabled defaultFsType: ext4 migratable: false reclaimPolicy: Delete recurringJobSelector: enable: false jobList: [] privateRegistry: createSecret: null registryPasswd: null registrySecret: null registryUrl: null registryUser: null resources: {} service: manager: loadBalancerIP: '' loadBalancerSourceRanges: '' nodePort: '' type: ClusterIP ui: nodePort: null type: ClusterIP serviceAccount: annotations: {} longhorn: default_setting: false
i
You have to install iscsi on each node
b
3 nodes can't have iscsi, 3 nodes are ubuntu specifically so it can hold the drivers
hence the taint tolerations
i
Check all prerequisites (like open-iscsi) are installed
b
was working before, just had some scheduling of resource issues that 'I think' were also related to having dedicated nodes just for longhorn.
was working before with those ubuntu nodes, that is.
i
Suggest to check all nodes again. Looks the log complains iscsi is not installed
b
I'll try... the other nodes are 'Container-Optimized OS from Google' so shrug, at least that gives me some things to literally 'google' LOL
thanks again Derek!
i
I see…not sure they are supported🤔I will check tomorrow
b
definetly not for the nodes that actually have the csi driver for sure. Hence my taints and tolerations.
I guess, I could do a rolling replacement of those nodes with ubuntu. I'm trying to simulate what I would do on baremetal anyways but for now I wanted to keep the CPU/mem low for cloud pricing reasons.
👍 1
1. "GKE clusters must use the
Ubuntu
OS instead of
Container-Optimized
OS, in order to satisfy Longhorn’s
open-iscsi
dependency." but I assume for storage connectivity between workloads iscsi is the requirement everywhere. So this might be the case here.
👍 1
FYI do anyone else following along, not sure how much this was required but I still put taints and tolerations to ensure only a couple noes were dedicated to storage but I did a rolling replacement of google's container optimized OS with ubuntu with containerd and things are now working as expected.
🙌 1
i
Awesome! Thanks for the update!
b
and as much as I dislike Rancher, LOL. this was easy to replace those worker nodes with the new ubuntu ones.
😄 3
i
BTW, if you want longhorn supports
Container-Optimized
OS, you can create a ticket in https://github.com/longhorn/longhorn/issues. Then, we can investigate it.
b
not sure if google is interested in supporting issci but I'll see if anyone else has mentioned it in open issues. Thanks!
👍 2
318 Views