big-judge-33880
03/09/2023, 9:18 AM[Thu Mar 9 09:13:02 2023] Buffer I/O error on dev dm-2, logical block 2441200, async page read
[Thu Mar 9 09:13:32 2023] buffer_io_error: 23 callbacks suppressed
[Thu Mar 9 09:13:32 2023] Buffer I/O error on dev dm-0, logical block 25837552, async page read
famous-shampoo-18483
03/09/2023, 4:28 PMbig-judge-33880
03/09/2023, 6:45 PMtime="2023-03-09T18:44:37Z" level=info msg="Failed backing image data source zzz-image-4gr75 is still in the backoff window, Longhorn cannot recreate pod for it" backingImageDataSource=zzz-image-4gr75 controller=longhorn-backing-image-data-source diskUUID=7606d6b6-7ee6-4f4b-a26c-5e1043c0b409 node=har-03 nodeID=har-03 parameters="map[url:<http://zzz/zzz-eval-datadisk.qcow2>]" sourceType=download
time="2023-03-09T18:45:07Z" level=info msg="Failed backing image data source zzz-image-4gr75 is still in the backoff window, Longhorn cannot recreate pod for it" backingImageDataSource=zzz-image-4gr75 controller=longhorn-backing-image-data-source diskUUID=7606d6b6-7ee6-4f4b-a26c-5e1043c0b409 node=har-03 nodeID=har-03 parameters="map[url:<http://zzz/zzz-eval-datadisk.qcow2>]" sourceType=download
[Thu Mar 9 18:44:47 2023] buffer_io_error: 17 callbacks suppressed
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-0, logical block 2441200, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-0, logical block 2441200, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-0, logical block 2441200, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-1, logical block 1220592, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-1, logical block 1220592, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-1, logical block 1220592, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-2, logical block 24413168, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-2, logical block 24413168, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-2, logical block 24413168, async page read
[Thu Mar 9 18:44:47 2023] Buffer I/O error on dev dm-3, logical block 2441200, async page read
[Thu Mar 9 18:45:17 2023] buffer_io_error: 17 callbacks suppressed
[Thu Mar 9 18:45:17 2023] Buffer I/O error on dev dm-0, logical block 2441200, async page read
[Thu Mar 9 18:45:17 2023] Buffer I/O error on dev dm-0, logical block 2441200, async page read
famous-shampoo-18483
03/10/2023, 10:44 AM2023-03-05T08:28:33.122418486Z [pvc-4d2d11a4-1966-4d98-981b-9180f3d9f98b-r-8814ee8e] runtime/cgo: pthread_create failed: Resource temporarily unavailable
2023-03-05T08:28:33.129791854Z [pvc-4d2d11a4-1966-4d98-981b-9180f3d9f98b-r-8814ee8e] SIGABRT: abort
2023-03-05T08:28:33.148562182Z PC=0xc4ba6b m=25[pvc-4d2d11a4-1966-4d98-981b-9180f3d9f98b-r-8814ee8e] sigcode=18446744073709551610
2023-03-05T08:28:33.148587426Z
2023-03-05T08:28:33.148595775Z goroutine 0 [idle]:
2023-03-05T08:28:33.148602116Z runtime: unknown pc 0xc4ba6b
2023-03-05T08:28:33.148607809Z stack: frame={sp:0x7fe882ffcc10, fp:0x0} stack=[0x7fe8827fd688,0x7fe882ffd288)
2023-03-05T08:28:33.148613164Z 00007fe882ffcb10: 0000000000000000 0000000000000000
......
2023-03-05T08:28:33.148852993Z goroutine 1 [chan receive, 40 minutes]:
2023-03-05T08:28:33.148856114Z <http://github.com/longhorn/longhorn-engine/app/cmd.startReplica(0xc0001cd4a0|github.com/longhorn/longhorn-engine/app/cmd.startReplica(0xc0001cd4a0>, 0xeccbc3, 0x0)
2023-03-05T08:28:33.148859526Z /go/src/github.com/longhorn/longhorn-engine/app/cmd/replica.go:168 +0x4f8
......
2023-03-05T08:28:33.148902147Z goroutine 19 [IO wait]:
...
2023-03-05T08:28:33.148960009Z /go/src/github.com/longhorn/longhorn-engine/app/cmd/replica.go:123 +0x397
2023-03-05T08:28:33.148963065Z created by <http://github.com/longhorn/longhorn-engine/app/cmd.startReplica|github.com/longhorn/longhorn-engine/app/cmd.startReplica>
2023-03-05T08:28:33.148966036Z /go/src/github.com/longhorn/longhorn-engine/app/cmd/replica.go:108 +0x300
......
2023-03-05T08:28:33.149152608Z goroutine 22 [syscall, 40 minutes]:
2023-03-05T08:28:33.149155801Z os/signal.signal_recv(0x0)
2023-03-05T08:28:33.149158776Z /usr/local/go/src/runtime/sigqueue.go:147 +0x9c
2023-03-05T08:28:33.149161936Z os/signal.loop()
2023-03-05T08:28:33.149165031Z /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
2023-03-05T08:28:33.149167964Z created by os/signal.Notify.func1
2023-03-05T08:28:33.149171057Z /usr/local/go/src/os/signal/signal.go:127 +0x44
......
2023-03-05T08:28:33.149248386Z goroutine 194484 [runnable]:
...
2023-03-05T08:28:33.149308242Z <http://github.com/longhorn/longhorn-engine/pkg/dataconn.(*Wire).Read(0xc00080e600|github.com/longhorn/longhorn-engine/pkg/dataconn.(*Wire).Read(0xc00080e600>, 0x111ef60, 0xc000120b40, 0xc00230acc0)
2023-03-05T08:28:33.149311502Z /go/src/github.com/longhorn/longhorn-engine/pkg/dataconn/wire.go:69 +0x76
2023-03-05T08:28:33.149314623Z <http://github.com/longhorn/longhorn-engine/pkg/dataconn.(*Server).readFromWire(0xc000120b40|github.com/longhorn/longhorn-engine/pkg/dataconn.(*Server).readFromWire(0xc000120b40>, 0xc000714180)
2023-03-05T08:28:33.149331016Z /go/src/github.com/longhorn/longhorn-engine/pkg/dataconn/server.go:37 +0x3d
2023-03-05T08:28:33.149346145Z created by <http://github.com/longhorn/longhorn-engine/pkg/dataconn.(*Server).read|github.com/longhorn/longhorn-engine/pkg/dataconn.(*Server).read>
2023-03-05T08:28:33.149350409Z /go/src/github.com/longhorn/longhorn-engine/pkg/dataconn/server.go:60 +0x83
......
This is probably one reason. But the problematic node har04
I found is not what you mentioned har03
...
For the thread issue, actually some users already reported it before. They fixed the problem by changing the pid limit of the container runtime.
Please check this for details: https://github.com/longhorn/longhorn/issues/3178samhandling-image-4gr75
big-judge-33880
03/10/2023, 11:57 AM● kubepods.slice - libcontainer container kubepods.slice
Loaded: loaded (/run/systemd/transient/kubepods.slice; transient)
Transient: yes
Drop-In: /run/systemd/transient/kubepods.slice.d
└─50-CPUShares.conf, 50-MemoryLimit.conf, 50-TasksMax.conf
Active: active since Thu 2023-02-16 15:47:13 UTC; 3 weeks 0 days ago
Tasks: 63609 (limit: 65536)
cat /proc/26609/cmdline
longhorn-manager-ddaemon--engine-imagelonghornio/longhorn-engine:v1.3.2--instance-manager-imagelonghornio/longhorn-instance-manager:v1_20221003--share-manager-imagelonghornio/longhorn-share-manager:v1_20221003--backing-image-manager-imagelonghornio/backing-image-manager:v3_20221003--manager-imagelonghornio/longhorn-manager:v1.3.2--service-accountlonghorn-service-account
famous-shampoo-18483
03/17/2023, 2:08 AM