https://rancher.com/ logo
Title
s

sticky-summer-13450

08/27/2022, 6:07 PM
I haven’t upgraded to v1.0.3 yet because I’m away from my three node cluster and have no way to remotely manage the hosts, but with v1.0.2 I’m finding Longhorn and Harvester quite fragile. On several occasions I’ve had the issue with too many shapshots. At the moment I have one VM which I just cannot start, seemingly because Longhorn is in a loop attaching and detaching a volume every 5 seconds. And I have another volume which keeps failing the replica on node 1 and rebuilding it, then failing the replica on node 2 and rebuilding it, then failing the replica on node 1 again and rebuilding it, then failing the replica on node 3 again and rebuilding it. I’m a little scared to upgrade Harvester in case the situation gets worse.
It seems that I need to understand Longhorn and nurse it along in order to keep Harvester v1.0.2 working. But I really don’t understand Longhorn, and I’m very worried that I may screw things up through my ignorance.
deleted
I upgraded Harvester from v1.0.2 to v1.0.3, but I still have the issue I mentioned with one VM. It cannot be started or be backed-up. Longhorn is constantly looping - attaching and detaching the volume. Two of the instance managers are doing the same thing constantly. Any idea what's going on, or should I take this query to the #longhorn-storage channel? This is from an instance-manager's logs...
2022-09-06T15:14:04+01:00 time="2022-09-06T14:14:04Z" level=info msg="Listening on sync 0.0.0.0:10092"
2022-09-06T15:14:04+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:04Z" level=info msg="Process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6 has started at localhost:10090"
2022-09-06T15:14:05+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:05Z" level=info msg="New connection from: 10.52.2.235:58878"
2022-09-06T15:14:05+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:05Z" level=info msg="Opening volume /host/var/lib/longhorn/replicas/pvc-af69ea5b-9798-443a-8527-583c5fd35b70-a0da610f, size 21474836480/512"
2022-09-06T15:14:05+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:05Z" level=info msg="Lost connection from: 10.52.2.235:58878"
2022-09-06T15:14:07+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:07Z" level=debug msg="Process Manager: start getting logs for process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:08+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:08Z" level=debug msg="Process Manager: got logs for process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=debug msg="Process Manager: prepare to delete process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=debug msg="Process Manager: deleted process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=debug msg="Process Manager: wait for process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6 to shutdown before unregistering process"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=debug msg="Process Manager: trying to stop process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=info msg="wait for process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6 to shutdown"
2022-09-06T15:14:10+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:10Z" level=warning msg="Received signal interrupt to shutdown"
2022-09-06T15:14:10+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:10Z" level=warning msg="Starting to execute registered shutdown func <http://github.com/longhorn/longhorn-engine/app/cmd.startReplica.func4|github.com/longhorn/longhorn-engine/app/cmd.startReplica.func4>"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=info msg="Process Manager: process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6 stopped"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=debug msg="Process Manager: prepare to delete process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=debug msg="Process Manager: deleted process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=info msg="Process Manager: successfully unregistered process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:10+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:10Z" level=info msg="Process Manager: successfully unregistered process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:11+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:11Z" level=info msg="Process Manager: prepare to create process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:11+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:11Z" level=debug msg="Process Manager: validate process path: /host/var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.4/longhorn dir: /host/var/lib/longhorn/engine-binaries/ image: longhornio-longhorn-engine-v1.2.4 binary: longhorn"
2022-09-06T15:14:11+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:11Z" level=info msg="Process Manager: created process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6"
2022-09-06T15:14:11+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:11Z" level=info msg="Listening on gRPC Replica server 0.0.0.0:10090"
2022-09-06T15:14:11+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:11Z" level=info msg="Listening on data server 0.0.0.0:10091"
2022-09-06T15:14:11+01:00 time="2022-09-06T14:14:11Z" level=info msg="Listening on sync agent server 0.0.0.0:10092"
2022-09-06T15:14:11+01:00 time="2022-09-06T14:14:11Z" level=info msg="Listening on sync 0.0.0.0:10092"
2022-09-06T15:14:11+01:00 [longhorn-instance-manager] time="2022-09-06T14:14:11Z" level=info msg="Process pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6 has started at localhost:10090"
2022-09-06T15:14:12+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:12Z" level=info msg="New connection from: 10.52.2.235:47670"
2022-09-06T15:14:12+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:12Z" level=info msg="Opening volume /host/var/lib/longhorn/replicas/pvc-af69ea5b-9798-443a-8527-583c5fd35b70-a0da610f, size 21474836480/512"
2022-09-06T15:14:12+01:00 [pvc-af69ea5b-9798-443a-8527-583c5fd35b70-r-2879f2f6] time="2022-09-06T14:14:12Z" level=info msg="Lost connection from: 10.52.2.235:47670"
p

prehistoric-balloon-31801

09/08/2022, 10:31 AM
Hi Mark, is the volume healthy? could you check it from Longhorn GUI to see if there are any warning messages.
s

sticky-summer-13450

09/27/2022, 11:41 AM
Sorry - I went on holiday. When I got back there were no unhealthy volumes.
But just today a similar problem is happening: https://rancher-users.slack.com/archives/C01GKHKAG0K/p1664278785862039