nutritious-oxygen-89191
03/27/2023, 11:54 AMkubectl get pods --namespace longhorn-system --watch
I get:
NAME READY STATUS RESTARTS AGE
longhorn-admission-webhook-56d6cdc68f-2c8f4 0/1 Init:0/1 0 4h50m
longhorn-admission-webhook-56d6cdc68f-5mf9n 0/1 Init:0/1 0 4h50m
longhorn-admission-webhook-668d9787dc-r8nnd 0/1 Init:0/1 0 4h19m
longhorn-conversion-webhook-55cf57895c-5dtc7 1/1 Running 0 4h50m
longhorn-conversion-webhook-55cf57895c-cn68m 1/1 Running 0 4h50m
longhorn-driver-deployer-77c46b4f54-gxnsx 0/1 Init:0/1 0 4h50m
longhorn-manager-bn4pr 0/1 Init:0/1 0 4h50m
longhorn-manager-v9jm7 0/1 Init:0/1 0 4h50m
longhorn-recovery-backend-ccfc78cb6-pbxfj 1/1 Running 0 4h50m
longhorn-recovery-backend-ccfc78cb6-pnxsz 1/1 Running 0 4h50m
longhorn-ui-b8c58884-fr45d 0/1 CrashLoopBackOff 57 (4m27s ago) 4h50m
longhorn-ui-b8c58884-wq2gm 1/1 Running 0 4h50m
So a lot of pods are in Init:0/1
and stuck there. I narrowed it down to this error in the longhorn-admission-webhook
which depends on longhorn-conversion-webhook
it seem:
Defaulted container "longhorn-admission-webhook" out of: longhorn-admission-webhook, wait-longhorn-conversion-webhook (init)
Error from server (BadRequest): container "longhorn-admission-webhook" in pod "longhorn-admission-webhook-56d6cdc68f-5mf9n" is waiting to start: PodInitializing
But the log for longhorn-conversion-webhook
seems fine. or not?
$ kubectl --namespace longhorn-system logs longhorn-conversion-webhook-55cf57895c-5dtc7
time="2023-03-27T06:58:39Z" level=info msg="Starting longhorn conversion webhook server"
W0327 06:58:39.336261 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-03-27T06:58:39Z" level=warning msg="Failed to init Kubernetes secret: secrets \"longhorn-webhook-tls\" not found"
time="2023-03-27T06:58:40Z" level=info msg="Listening on :9443"
time="2023-03-27T06:58:40Z" level=info msg="certificate CN=dynamic,O=dynamic signed by CN=dynamiclistener-ca,O=dynamiclistener-org: notBefore=2023-03-27 06:58:36 +0000 UTC notAfter=2024-03-26 06:58:40 +0000 UTC"
time="2023-03-27T06:58:40Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:41Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=10272145) (count 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:42Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-03-27T06:58:42Z" level=info msg="Building conversion rules..."
time="2023-03-27T06:58:42Z" level=info msg="Starting <http://apiextensions.k8s.io/v1|apiextensions.k8s.io/v1>, Kind=CustomResourceDefinition controller"
time="2023-03-27T06:58:42Z" level=info msg="Starting <http://apiregistration.k8s.io/v1|apiregistration.k8s.io/v1>, Kind=APIService controller"
time="2023-03-27T06:58:42Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://nodes.longhorn.io|nodes.longhorn.io>"
time="2023-03-27T06:58:43Z" level=info msg="Update CRD for <http://volumes.longhorn.io|volumes.longhorn.io>"
aloof-hair-13897
03/27/2023, 1:28 PMkubectl get pods --namespace longhorn-system
?
And Could you check logs of others longhorn-conversion-webhook
?nutritious-oxygen-89191
03/27/2023, 3:19 PMlonghorn-conversion-webhook
$ kubectl --namespace longhorn-system logs longhorn-conversion-webhook-55cf57895c-cn68m
time="2023-03-27T06:58:35Z" level=info msg="Starting longhorn conversion webhook server"
W0327 06:58:35.776898 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-03-27T06:58:35Z" level=warning msg="Failed to init Kubernetes secret: secrets \"longhorn-webhook-tls\" not found"
time="2023-03-27T06:58:36Z" level=info msg="generated self-signed CA certificate CN=dynamiclistener-ca,O=dynamiclistener-org: notBefore=2023-03-27 06:58:36.345554557 +0000 UTC notAfter=2033-03-24 06:58:36.345554557 +0000 UTC"
time="2023-03-27T06:58:38Z" level=info msg="Listening on :9443"
time="2023-03-27T06:58:38Z" level=info msg="certificate CN=dynamic,O=dynamic signed by CN=dynamiclistener-ca,O=dynamiclistener-org: notBefore=2023-03-27 06:58:36 +0000 UTC notAfter=2024-03-26 06:58:38 +0000 UTC"
time="2023-03-27T06:58:38Z" level=info msg="Creating new TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]|listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]>"
time="2023-03-27T06:58:39Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=10272129) (count 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]|listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]>"
time="2023-03-27T06:58:41Z" level=info msg="Starting <http://apiregistration.k8s.io/v1|apiregistration.k8s.io/v1>, Kind=APIService controller"
time="2023-03-27T06:58:41Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-03-27T06:58:41Z" level=info msg="Starting <http://apiextensions.k8s.io/v1|apiextensions.k8s.io/v1>, Kind=CustomResourceDefinition controller"
time="2023-03-27T06:58:41Z" level=info msg="Building conversion rules..."
time="2023-03-27T06:58:41Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:41Z" level=info msg="Update CRD for <http://backingimages.longhorn.io|backingimages.longhorn.io>"
time="2023-03-27T06:58:41Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=10272145) (count 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://backuptargets.longhorn.io|backuptargets.longhorn.io>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://engineimages.longhorn.io|engineimages.longhorn.io>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://nodes.longhorn.io|nodes.longhorn.io>"
time="2023-03-27T06:58:42Z" level=info msg="Building conversion rules..."
kubectl get pods --namespace longhorn-system
remains the same, still stuckaloof-hair-13897
03/28/2023, 12:42 AMnutritious-oxygen-89191
03/28/2023, 11:26 AMkubectl
$ kubectl get pods --namespace longhorn-system --watch
NAME READY STATUS RESTARTS AGE
longhorn-admission-webhook-66bdd6674b-62tzp 0/1 Init:0/1 0 44s
longhorn-admission-webhook-66bdd6674b-lvqdm 1/1 Running 0 44s
longhorn-conversion-webhook-7b9fd878b6-jsl7p 1/1 Running 0 45s
longhorn-conversion-webhook-7b9fd878b6-p8vvq 1/1 Running 0 45s
longhorn-driver-deployer-7fddbc7f5c-sm9fp 0/1 Init:0/1 0 48s
longhorn-manager-6zv76 0/1 Init:0/1 0 48s
longhorn-manager-bf48z 0/1 Error 1 (20s ago) 48s
longhorn-recovery-backend-7d686cd599-47d2g 1/1 Running 0 47s
longhorn-recovery-backend-7d686cd599-vf7pd 1/1 Running 0 47s
longhorn-ui-59fdc4c7b9-8d2p8 1/1 Running 0 46s
longhorn-ui-59fdc4c7b9-hpdp9 1/1 Running 1 (21s ago) 46s
longhorn-ui-59fdc4c7b9-hpdp9 0/1 Error 1 (22s ago) 47s
longhorn-manager-bf48z 0/1 CrashLoopBackOff 1 (15s ago) 54s
longhorn-manager-bf48z 0/1 Running 2 (16s ago) 55s
longhorn-ui-59fdc4c7b9-hpdp9 0/1 CrashLoopBackOff 1 (13s ago) 59s
longhorn-ui-59fdc4c7b9-hpdp9 1/1 Running 2 (14s ago) 60s
longhorn-manager-bf48z 0/1 Error 2 (26s ago) 65s
longhorn-manager-bf48z 0/1 CrashLoopBackOff 2 (15s ago) 79s
longhorn-ui-59fdc4c7b9-hpdp9 0/1 Error 2 (34s ago) 80s
longhorn-manager-bf48z 0/1 Running 3 (27s ago) 91s
longhorn-ui-59fdc4c7b9-hpdp9 0/1 CrashLoopBackOff 2 (16s ago) 95s
longhorn-manager-bf48z 0/1 Error 3 (38s ago) 102s
longhorn-ui-59fdc4c7b9-hpdp9 1/1 Running 3 (30s ago) 109s
longhorn-manager-bf48z 0/1 CrashLoopBackOff 3 (13s ago) 114s
$ kubect--namespace longhorn-system logs longhorn-manager-bf48z
Defaulted container "longhorn-manager" out of: longhorn-manager, wait-longhorn-admission-webhook (init)
W0328 11:25:21.547992 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-03-28T11:25:21Z" level=info msg="cannot list the content of the src directory /var/lib/rancher/longhorn/engine-binaries for the copy, will do nothing: failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net bash -c ls /var/lib/rancher/longhorn/engine-binaries/*], output , stderr ls: cannot access '/var/lib/rancher/longhorn/engine-binaries/*': No such file or directory\n: exit status 2"
I0328 11:25:21.600950 1 leaderelection.go:248] attempting to acquire leader lease longhorn-system/longhorn-manager-upgrade-lock...
I0328 11:25:21.628618 1 leaderelection.go:258] successfully acquired lease longhorn-system/longhorn-manager-upgrade-lock
time="2023-03-28T11:25:21Z" level=info msg="Start upgrading"
time="2023-03-28T11:25:21Z" level=info msg="setting default-engine-image not found"
time="2023-03-28T11:25:31Z" level=error msg="Upgrade failed: upgrade API version failed: cannot create CRDAPIVersionSetting: Internal error occurred: failed calling webhook \"<http://validator.longhorn.io|validator.longhorn.io>\": failed to call webhook: Post \"<https://longhorn-admission-webhook.longhorn-system.svc:9443/v1/webhook/validaton?timeout=10s>\": context deadline exceeded"
time="2023-03-28T11:25:31Z" level=info msg="Upgrade leader lost: geo-node1"
time="2023-03-28T11:25:31Z" level=fatal msg="Error starting manager: upgrade API version failed: cannot create CRDAPIVersionSetting: Internal error occurred: failed calling webhook \"<http://validator.longhorn.io|validator.longhorn.io>\": failed to call webhook: Post \"<https://longhorn-admission-webhook.longhorn-system.svc:9443/v1/webhook/validaton?timeout=10s>\": context deadline exceeded"
aloof-hair-13897
03/28/2023, 11:48 AM<https://longhorn-admission-webhook.longhorn-system.svc:9443>
is accessible?nutritious-oxygen-89191
03/28/2023, 12:07 PMcurl: (6) Could not resolve host: longhorn-admission-webhook.longhorn-system.svc
aloof-hair-13897
03/28/2023, 12:34 PMkubectl -n longhorn-system get endpoints
and kubectl -n longhorn-system get svc
?nutritious-oxygen-89191
03/28/2023, 12:58 PM$ kubectl -n longhorn-system get endpoints
NAME ENDPOINTS AGE
longhorn-admission-webhook 10.42.1.90:9443 95m
longhorn-backend 95m
longhorn-conversion-webhook 10.42.0.159:9443,10.42.1.89:9443 95m
longhorn-engine-manager <none> 95m
longhorn-frontend 10.42.0.158:8000,10.42.1.88:8000 95m
longhorn-recovery-backend 10.42.0.157:9600,10.42.1.87:9600 95m
longhorn-replica-manager <none> 95m
$ kubectl -n longhorn-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
longhorn-admission-webhook ClusterIP 10.43.211.113 <none> 9443/TCP 95m
longhorn-backend ClusterIP 10.43.162.232 <none> 9500/TCP 95m
longhorn-conversion-webhook ClusterIP 10.43.104.101 <none> 9443/TCP 95m
longhorn-engine-manager ClusterIP None <none> <none> 95m
longhorn-frontend ClusterIP 10.43.77.161 <none> 80/TCP 95m
longhorn-recovery-backend ClusterIP 10.43.229.91 <none> 9600/TCP 95m
longhorn-replica-manager ClusterIP None <none> <none> 95m
aloof-hair-13897
03/28/2023, 1:35 PMlonghorn-manager pod
?
#telnet longhorn-admission-webhook.longhorn-system.svc 9443
nutritious-oxygen-89191
03/28/2023, 1:57 PMI have no name!@longhorn-admission-webhook-66bdd6674b-lvqdm:/> telnet longhorn-admission-webhook.longhorn-system.svc 9443
Trying 10.43.211.113...
Connected to longhorn-admission-webhook.longhorn-system.svc.
Escape character is '^]'.
aloof-hair-13897
03/28/2023, 2:09 PMfirewalld
?Following the tutorial
could you provide the link?nutritious-oxygen-89191
03/28/2023, 2:14 PMfirewalld
is not running. ufw
is inactive on the manager and active on the workeraloof-hair-13897
03/28/2023, 2:21 PMPrerequisites
section?
And you have the rancher installed in the cluster as well?nutritious-oxygen-89191
03/28/2023, 2:24 PMaloof-hair-13897
03/28/2023, 2:26 PMlonghorn-admission-webhook-xxxxx
?nutritious-oxygen-89191
03/28/2023, 2:27 PM$ bash longhorn_environment_check.sh
[INFO] Required dependencies 'kubectl jq mktemp' are installed.
[INFO] Hostname uniqueness check is passed.
[INFO] Waiting for longhorn-environment-check pods to become ready (0/2)...
[INFO] All longhorn-environment-check pods are ready (2/2).
[INFO] Required packages are installed.
[WARN] multipathd is running on localhost.
[WARN] multipathd is running on geo-node1.
[WARN] multipathd would probably result in the Longhorn volume mount failure. Please refer to <https://longhorn.io/kb/troubleshooting-volume-with-multipath> for more information.
[INFO] MountPropagation is enabled.
[INFO] Cleaning up longhorn-environment-check pods...
[INFO] Cleanup completed.
aloof-hair-13897
03/28/2023, 2:40 PM