rich-shoe-36510
03/08/2023, 11:14 AMable-library-94454
03/09/2023, 4:08 AMable-library-94454
03/09/2023, 5:31 AMbig-judge-33880
03/09/2023, 9:18 AM[Thu Mar 9 09:13:02 2023] Buffer I/O error on dev dm-2, logical block 2441200, async page read
[Thu Mar 9 09:13:32 2023] buffer_io_error: 23 callbacks suppressed
[Thu Mar 9 09:13:32 2023] Buffer I/O error on dev dm-0, logical block 25837552, async page read
crooked-cat-21365
03/13/2023, 7:21 AMMar 12 00:00:02 srvl062 kernel: [713683.338101] nfs: server 10.43.170.7 not responding, timed out
Mar 12 00:00:03 srvl062 kernel: [713683.914078] nfs: server 10.43.187.251 not responding, timed out
Mar 12 00:00:03 srvl062 kernel: [713684.170058] nfs: server 10.43.28.105 not responding, timed out
Mar 12 00:00:05 srvl062 kernel: [713685.453982] nfs: server 10.43.86.90 not responding, timed out
Mar 12 00:00:05 srvl062 kernel: [713685.610363] nfs: server 10.43.133.91 not responding, timed out
Mar 12 00:00:05 srvl062 kernel: [713686.217960] nfs: server 10.43.28.71 not responding, timed out
Mar 12 00:00:06 srvl062 kernel: [713687.245956] nfs: server 10.43.43.8 not responding, timed out
Mar 12 00:00:08 srvl062 kernel: [713689.289855] nfs: server 10.43.131.82 not responding, timed out
Mar 12 00:00:10 srvl062 kernel: [713690.501785] nfs: server 10.43.170.7 not responding, timed out
Mar 12 00:00:10 srvl062 kernel: [713690.569795] nfs: server 10.43.86.90 not responding, timed out
Mar 12 00:00:13 srvl062 kernel: [713693.385742] nfs: server 10.43.187.251 not responding, timed out
Mar 12 00:00:13 srvl062 kernel: [713694.185659] nfs: server 10.43.43.8 not responding, timed out
Mar 12 00:00:14 srvl062 kernel: [713695.013664] nfs: server 10.43.28.105 not responding, timed out
Mar 12 00:00:15 srvl062 kernel: [713695.693620] nfs: server 10.43.86.90 not responding, timed out
Mar 12 00:00:16 srvl062 kernel: [713696.521595] nfs: server 10.43.133.91 not responding, timed out
Mar 12 00:00:20 srvl062 kernel: [713700.557466] nfs: server 10.43.170.165 not responding, timed out
Mar 12 00:00:20 srvl062 kernel: [713700.617405] nfs: server 10.43.86.90 not responding, timed out
Mar 12 00:00:20 srvl062 kernel: [713700.813461] nfs: server 10.43.131.82 not responding, timed out
Mar 12 00:00:21 srvl062 kernel: [713701.641479] nfs: server 10.43.187.251 not responding, timed out
Mar 12 00:00:22 srvl062 kernel: [713703.241337] nfs: server 10.43.70.182 not responding, timed out
Mar 12 00:00:24 srvl062 kernel: [713704.717343] nfs: server 10.43.170.7 not responding, timed out
Mar 12 00:00:25 srvl062 kernel: [713705.485281] nfs: server 10.43.202.97 not responding, timed out
What is the recommended procedure to get out of this?aloof-hair-13897
03/13/2023, 1:25 PMbland-painting-61617
03/13/2023, 3:50 PMcrooked-cat-21365
03/14/2023, 8:42 AMbusy-judge-4614
03/15/2023, 6:40 PMbland-painting-61617
03/15/2023, 8:07 PMfailed to set the setting priority-class with invalid value rancher-critical: cannot modify priority class setting before all volumes are detached
The change should be put in place and applied when possible, without causing downtime.
There is no easy way to detach all volumes without scaling all workload to 0, which is painful.
Maybe a kill/maintenance switch in longhorn can be implemented to facilitate such changes in an easier manner.creamy-pencil-82913
03/15/2023, 9:28 PMnumerous-lighter-90852
03/17/2023, 2:56 PMlate-needle-80860
03/17/2023, 3:12 PMlate-needle-80860
03/17/2023, 3:32 PMlate-needle-80860
03/17/2023, 4:03 PMlate-needle-80860
03/17/2023, 4:08 PMred-printer-88208
03/18/2023, 4:49 AMlate-needle-80860
03/18/2023, 6:36 PMlate-needle-80860
03/18/2023, 6:37 PMlate-needle-80860
03/18/2023, 8:26 PMbusy-judge-4614
03/19/2023, 12:34 PMbitter-tailor-6977
03/22/2023, 5:43 AMenough-memory-12110
03/22/2023, 4:07 PMquiet-area-89381
03/24/2023, 11:48 PMMountVolume.MountDevice failed for volume "pvc-2e1be23b-4021-45d2-878b-d37cd71679dc" : rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-2e1be23b-4021-45d2-878b-d37cd71679dc" failed: type:("ext4") target:("/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/c072c61d6cbcb55eac5a9120cce1da05cba3ab07bb0b540278a9cfcc67b87887/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)...
quiet-area-89381
03/24/2023, 11:48 PMwonderful-kangaroo-15590
03/27/2023, 10:26 AMrefined-analyst-8898
03/27/2023, 11:12 AMnutritious-oxygen-89191
03/27/2023, 11:48 AMnutritious-oxygen-89191
03/27/2023, 11:54 AMkubectl get pods --namespace longhorn-system --watch
I get:
NAME READY STATUS RESTARTS AGE
longhorn-admission-webhook-56d6cdc68f-2c8f4 0/1 Init:0/1 0 4h50m
longhorn-admission-webhook-56d6cdc68f-5mf9n 0/1 Init:0/1 0 4h50m
longhorn-admission-webhook-668d9787dc-r8nnd 0/1 Init:0/1 0 4h19m
longhorn-conversion-webhook-55cf57895c-5dtc7 1/1 Running 0 4h50m
longhorn-conversion-webhook-55cf57895c-cn68m 1/1 Running 0 4h50m
longhorn-driver-deployer-77c46b4f54-gxnsx 0/1 Init:0/1 0 4h50m
longhorn-manager-bn4pr 0/1 Init:0/1 0 4h50m
longhorn-manager-v9jm7 0/1 Init:0/1 0 4h50m
longhorn-recovery-backend-ccfc78cb6-pbxfj 1/1 Running 0 4h50m
longhorn-recovery-backend-ccfc78cb6-pnxsz 1/1 Running 0 4h50m
longhorn-ui-b8c58884-fr45d 0/1 CrashLoopBackOff 57 (4m27s ago) 4h50m
longhorn-ui-b8c58884-wq2gm 1/1 Running 0 4h50m
So a lot of pods are in Init:0/1
and stuck there. I narrowed it down to this error in the longhorn-admission-webhook
which depends on longhorn-conversion-webhook
it seem:
Defaulted container "longhorn-admission-webhook" out of: longhorn-admission-webhook, wait-longhorn-conversion-webhook (init)
Error from server (BadRequest): container "longhorn-admission-webhook" in pod "longhorn-admission-webhook-56d6cdc68f-5mf9n" is waiting to start: PodInitializing
But the log for longhorn-conversion-webhook
seems fine. or not?
$ kubectl --namespace longhorn-system logs longhorn-conversion-webhook-55cf57895c-5dtc7
time="2023-03-27T06:58:39Z" level=info msg="Starting longhorn conversion webhook server"
W0327 06:58:39.336261 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-03-27T06:58:39Z" level=warning msg="Failed to init Kubernetes secret: secrets \"longhorn-webhook-tls\" not found"
time="2023-03-27T06:58:40Z" level=info msg="Listening on :9443"
time="2023-03-27T06:58:40Z" level=info msg="certificate CN=dynamic,O=dynamic signed by CN=dynamiclistener-ca,O=dynamiclistener-org: notBefore=2023-03-27 06:58:36 +0000 UTC notAfter=2024-03-26 06:58:40 +0000 UTC"
time="2023-03-27T06:58:40Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:41Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=10272145) (count 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:42Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-03-27T06:58:42Z" level=info msg="Building conversion rules..."
time="2023-03-27T06:58:42Z" level=info msg="Starting <http://apiextensions.k8s.io/v1|apiextensions.k8s.io/v1>, Kind=CustomResourceDefinition controller"
time="2023-03-27T06:58:42Z" level=info msg="Starting <http://apiregistration.k8s.io/v1|apiregistration.k8s.io/v1>, Kind=APIService controller"
time="2023-03-27T06:58:42Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 1): map[<http://listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc|listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc> <http://listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]|listener.cattle.io/fingerprint:SHA1=8AE712382F7744254DB21B7BE49563465445AA91]>"
time="2023-03-27T06:58:42Z" level=info msg="Update CRD for <http://nodes.longhorn.io|nodes.longhorn.io>"
time="2023-03-27T06:58:43Z" level=info msg="Update CRD for <http://volumes.longhorn.io|volumes.longhorn.io>"
swift-student-44295
03/27/2023, 3:45 PM