This message was deleted.
# longhorn-storage
a
This message was deleted.
a
Are you pods still stuck? Could you show the logs just by this command
kubectl get pods --namespace longhorn-system
? And Could you check logs of others
longhorn-conversion-webhook
?
n
This is the log of the second
longhorn-conversion-webhook
Copy code
$ kubectl --namespace longhorn-system logs longhorn-conversion-webhook-55cf57895c-cn68m
time="2023-03-27T06:58:35Z" level=info msg="Starting longhorn conversion webhook server"
W0327 06:58:35.776898       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2023-03-27T06:58:35Z" level=warning msg="Failed to init Kubernetes secret: secrets \"longhorn-webhook-tls\" not found"
time="2023-03-27T06:58:36Z" level=info msg="generated self-signed CA certificate CN=dynamiclistener-ca,O=dynamiclistener-org: notBefore=2023-03-27 06:58:36.345554557 +0000 UTC notAfter=2033-03-24 06:58:36.345554557 +0000 UTC"
time="2023-03-27T06:58:38Z" level=info msg="Listening on :9443"
time="2023-03-27T06:58:38Z" level=info msg="certificate CN=dynamic,O=dynamic signed by CN=dynamiclistener-ca,O=dynamiclistener-org: notBefore=2023-03-27 06:58:36 +0000 UTC notAfter=2024-03-26 06:58:38 +0000 UTC"
time="2023-03-27T06:58:38Z" level=info msg="Creating new TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]|listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]>"
time="2023-03-27T06:58:39Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=10272129) (count 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]|listener.cattle.io/fingerprint:SHA1=EAD10F1F7445D5B3C7456A20D58E6B9D6AEBBE3D]>"
time="2023-03-27T06:58:41Z" level=info msg="Starting <http://apiregistration.k8s.io/v1|apiregistration.k8s.io/v1>, Kind=APIService controller"
time="2023-03-27T06:58:41Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-03-27T06:58:41Z" level=info msg="Starting <http://apiextensions.k8s.io/v1|apiextensions.k8s.io/v1>, Kind=CustomResourceDefinition controller"
time="2023-03-27T06:58:41Z" level=info msg="Building conversion rules..."
time="2023-03-27T06:58:41Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:41Z" level=info msg="Update CRD for <http://backingimages.longhorn.io|backingimages.longhorn.io>"
time="2023-03-27T06:58:41Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=10272145) (count 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://backuptargets.longhorn.io|backuptargets.longhorn.io>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://engineimages.longhorn.io|engineimages.longhorn.io>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://nodes.longhorn.io|nodes.longhorn.io>"
time="2023-03-27T06:58:42Z" level=info msg="Building conversion rules..."
kubectl get pods --namespace longhorn-system
remains the same, still stuck
a
Which Longhorn version did you install? Did you install Longhorn by Rancher?
n
I have updated both nodes (one manager and one worker) to K3s v1.26.4 and longhorn 1.4.1 using
kubectl
Copy code
$ kubectl get pods --namespace longhorn-system --watch
NAME                                           READY   STATUS     RESTARTS      AGE
longhorn-admission-webhook-66bdd6674b-62tzp    0/1     Init:0/1   0             44s
longhorn-admission-webhook-66bdd6674b-lvqdm    1/1     Running    0             44s
longhorn-conversion-webhook-7b9fd878b6-jsl7p   1/1     Running    0             45s
longhorn-conversion-webhook-7b9fd878b6-p8vvq   1/1     Running    0             45s
longhorn-driver-deployer-7fddbc7f5c-sm9fp      0/1     Init:0/1   0             48s
longhorn-manager-6zv76                         0/1     Init:0/1   0             48s
longhorn-manager-bf48z                         0/1     Error      1 (20s ago)   48s
longhorn-recovery-backend-7d686cd599-47d2g     1/1     Running    0             47s
longhorn-recovery-backend-7d686cd599-vf7pd     1/1     Running    0             47s
longhorn-ui-59fdc4c7b9-8d2p8                   1/1     Running    0             46s
longhorn-ui-59fdc4c7b9-hpdp9                   1/1     Running    1 (21s ago)   46s
longhorn-ui-59fdc4c7b9-hpdp9                   0/1     Error      1 (22s ago)   47s
longhorn-manager-bf48z                         0/1     CrashLoopBackOff   1 (15s ago)   54s
longhorn-manager-bf48z                         0/1     Running            2 (16s ago)   55s
longhorn-ui-59fdc4c7b9-hpdp9                   0/1     CrashLoopBackOff   1 (13s ago)   59s
longhorn-ui-59fdc4c7b9-hpdp9                   1/1     Running            2 (14s ago)   60s
longhorn-manager-bf48z                         0/1     Error              2 (26s ago)   65s
longhorn-manager-bf48z                         0/1     CrashLoopBackOff   2 (15s ago)   79s
longhorn-ui-59fdc4c7b9-hpdp9                   0/1     Error              2 (34s ago)   80s
longhorn-manager-bf48z                         0/1     Running            3 (27s ago)   91s
longhorn-ui-59fdc4c7b9-hpdp9                   0/1     CrashLoopBackOff   2 (16s ago)   95s
longhorn-manager-bf48z                         0/1     Error              3 (38s ago)   102s
longhorn-ui-59fdc4c7b9-hpdp9                   1/1     Running            3 (30s ago)   109s
longhorn-manager-bf48z                         0/1     CrashLoopBackOff   3 (13s ago)   114s
Copy code
$ kubect--namespace longhorn-system logs longhorn-manager-bf48z
Defaulted container "longhorn-manager" out of: longhorn-manager, wait-longhorn-admission-webhook (init)
W0328 11:25:21.547992       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2023-03-28T11:25:21Z" level=info msg="cannot list the content of the src directory /var/lib/rancher/longhorn/engine-binaries for the copy, will do nothing: failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net bash -c ls /var/lib/rancher/longhorn/engine-binaries/*], output , stderr ls: cannot access '/var/lib/rancher/longhorn/engine-binaries/*': No such file or directory\n: exit status 2"
I0328 11:25:21.600950       1 leaderelection.go:248] attempting to acquire leader lease longhorn-system/longhorn-manager-upgrade-lock...
I0328 11:25:21.628618       1 leaderelection.go:258] successfully acquired lease longhorn-system/longhorn-manager-upgrade-lock
time="2023-03-28T11:25:21Z" level=info msg="Start upgrading"
time="2023-03-28T11:25:21Z" level=info msg="setting default-engine-image not found"
time="2023-03-28T11:25:31Z" level=error msg="Upgrade failed: upgrade API version failed: cannot create CRDAPIVersionSetting: Internal error occurred: failed calling webhook \"<http://validator.longhorn.io|validator.longhorn.io>\": failed to call webhook: Post \"<https://longhorn-admission-webhook.longhorn-system.svc:9443/v1/webhook/validaton?timeout=10s>\": context deadline exceeded"
time="2023-03-28T11:25:31Z" level=info msg="Upgrade leader lost: geo-node1"
time="2023-03-28T11:25:31Z" level=fatal msg="Error starting manager: upgrade API version failed: cannot create CRDAPIVersionSetting: Internal error occurred: failed calling webhook \"<http://validator.longhorn.io|validator.longhorn.io>\": failed to call webhook: Post \"<https://longhorn-admission-webhook.longhorn-system.svc:9443/v1/webhook/validaton?timeout=10s>\": context deadline exceeded"
a
Can you check if your DNS works well and check if the
<https://longhorn-admission-webhook.longhorn-system.svc:9443>
is accessible?
n
not sure how to check that, but if I curl from the host or another pod I get
curl: (6) Could not resolve host: longhorn-admission-webhook.longhorn-system.svc
a
Could you show the logs from commands
kubectl -n longhorn-system get endpoints
and
kubectl -n longhorn-system get svc
?
n
Copy code
$ kubectl -n longhorn-system get endpoints
NAME                          ENDPOINTS                          AGE
longhorn-admission-webhook    10.42.1.90:9443                    95m
longhorn-backend                                                 95m
longhorn-conversion-webhook   10.42.0.159:9443,10.42.1.89:9443   95m
longhorn-engine-manager       <none>                             95m
longhorn-frontend             10.42.0.158:8000,10.42.1.88:8000   95m
longhorn-recovery-backend     10.42.0.157:9600,10.42.1.87:9600   95m
longhorn-replica-manager      <none>                             95m
Copy code
$ kubectl -n longhorn-system get svc
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
longhorn-admission-webhook    ClusterIP   10.43.211.113   <none>        9443/TCP   95m
longhorn-backend              ClusterIP   10.43.162.232   <none>        9500/TCP   95m
longhorn-conversion-webhook   ClusterIP   10.43.104.101   <none>        9443/TCP   95m
longhorn-engine-manager       ClusterIP   None            <none>        <none>     95m
longhorn-frontend             ClusterIP   10.43.77.161    <none>        80/TCP     95m
longhorn-recovery-backend     ClusterIP   10.43.229.91    <none>        9600/TCP   95m
longhorn-replica-manager      ClusterIP   None            <none>        <none>     95m
a
Could you test this command in the
longhorn-manager pod
?
Copy code
#telnet longhorn-admission-webhook.longhorn-system.svc 9443
n
the manager always crashes, tried the command from the admission-webhook
Copy code
I have no name!@longhorn-admission-webhook-66bdd6674b-lvqdm:/> telnet longhorn-admission-webhook.longhorn-system.svc 9443
Trying 10.43.211.113...
Connected to longhorn-admission-webhook.longhorn-system.svc.
Escape character is '^]'.
sorry, from the conversion-webhook
a
Do you turn on the
firewalld
?
Following the tutorial
could you provide the link?
n
no
firewalld
is not running.
ufw
is inactive on the manager and active on the worker
a
Had you ran the scripts in
Prerequisites
section? And you have the rancher installed in the cluster as well?
n
yes, i ran the script and rancher is installed. will run the script again, to see if that gives some indication
a
Could you also show the logs of
longhorn-admission-webhook-xxxxx
?
n
Copy code
$ bash longhorn_environment_check.sh 
[INFO]  Required dependencies 'kubectl jq mktemp' are installed.
[INFO]  Hostname uniqueness check is passed.
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/2)...
[INFO]  All longhorn-environment-check pods are ready (2/2).
[INFO]  Required packages are installed.
[WARN]  multipathd is running on localhost.
[WARN]  multipathd is running on geo-node1.
[WARN]  multipathd would probably result in the Longhorn volume mount failure. Please refer to <https://longhorn.io/kb/troubleshooting-volume-with-multipath> for more information.
[INFO]  MountPropagation is enabled.
[INFO]  Cleaning up longhorn-environment-check pods...
[INFO]  Cleanup completed.
a
And Please help check the kubelet log. Thank you.
w
Having exactly same error. Any solution?
748 Views