This message was deleted.
# longhorn-storage
a
This message was deleted.
f
The disk IO saturation is very high (142%,180%) it appear when someone upload file to redis-master (1 replica) the single file nearly 15MB it not happen always but from time to time when I or someone will upload file to redis
while the disk io arise then cpu saturaion arise to and here I see the problem is on specific node
192.168.1.163
and it above 300% (on this node redis is up)
when error appear I have failed replicas it is
192.168.1.163
after several moments rebuilding is starting on node 163 (when it back to life)
then it is update the state on other nodes (I'm guess)
and finally all back to normal
dmesg | tail
(on 192.168.1.163) is returing
Copy code
[1594329.136250] sd 5:0:0:1: [sdd] Attached SCSI disk
[1594332.843527] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
[1594334.300191] IPv6: ADDRCONF(NETDEV_UP): lxc561a89fb6db7: link is not ready
[1594334.311581] eth0: renamed from tmpe73d5
[1594334.331348] IPv6: ADDRCONF(NETDEV_CHANGE): lxc561a89fb6db7: link becomes ready
[1594360.579431] EXT4-fs warning (device sdb): htree_dirblock_to_tree:984: inode #2: lblock 0: comm longhorn-manage: error -5 reading directory block
[1594364.917450] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: (null)
[1594366.276998] IPv6: ADDRCONF(NETDEV_UP): lxc377ac94e1cc2: link is not ready
[1594366.285463] eth0: renamed from tmp35dfa
[1594366.300134] IPv6: ADDRCONF(NETDEV_CHANGE): lxc377ac94e1cc2: link becomes ready
without | tail log in attachment
I also see alarming metrics value
sum(rate(container_memory_failures_total{pod!=""}[5m])) by (pod)
up to 7k+ for
longhorn-manager
where most of app are below 100 for this time series
and I see in logs
instance-manager
Copy code
[pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c] time="2022-07-19T09:21:35Z" level=warning msg="Received signal interrupt to shutdown"
this pvc is pointing to redis pvc also I have some more logs
Copy code
time="2022-07-19T09:19:43Z" level=error msg="Error reading from wire: read tcp 10.0.2.60:36200->10.0.1.253:10016: use of closed network connection"
and for
longhorn-manager
Copy code
time="2022-07-19T08:37:38Z" level=debug msg="CheckEngineImageReadiness: nodes [testser.local] don't have the engine image longhornio/longhorn-engine:v1.2.3"
full context for
instance-manager
Copy code
[longhorn-instance-manager] time="2022-07-19T09:21:33Z" level=debug msg="Process Manager: start getting logs for process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:33Z" level=debug msg="Process Manager: got logs for process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=debug msg="Process Manager: prepare to delete process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=debug msg="Process Manager: deleted process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=debug msg="Process Manager: trying to stop process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=info msg="wait for process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c to shutdown"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=debug msg="Process Manager: wait for process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c to shutdown before unregistering process"
[pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c] time="2022-07-19T09:21:35Z" level=warning msg="Received signal interrupt to shutdown"
time="2022-07-19T09:21:35Z" level=warning msg="Starting to execute registered shutdown func <http://github.com/longhorn/longhorn-engine/app/cmd.startReplica.func4|github.com/longhorn/longhorn-engine/app/cmd.startReplica.func4>"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=info msg="Process Manager: process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c stopped"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=debug msg="Process Manager: prepare to delete process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=debug msg="Process Manager: deleted process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:35Z" level=info msg="Process Manager: successfully unregistered process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
[longhorn-instance-manager] time="2022-07-19T09:21:36Z" level=info msg="Process Manager: successfully unregistered process pvc-280f7155-bee9-4ede-9110-ea0ce977aba3-r-9078586c"
and it will appear for my many pvc for example for prometheus
and from redis whole stack
any tips why it happening?
257 Views