This message was deleted.
# longhorn-storage
a
This message was deleted.
f
Interesting, @enough-knife-17390. My gut reaction is that is actually not a Longhorn issue, but we need to dig a little to find out. Some context: The used calculation done by the UI is:
storageMaximum - storageAvailable
. https://github.com/longhorn/longhorn-ui/blob/2015ad379a202275180705bc0dfde9ca26ace2ae/src/routes/host/HostList.js#L206-L211 These two values come from a system call that is the equivalent of something like `df /var/lib/longhorn`:
Copy code
root@eweber-v126-worker-9c1451b4-6464j:~# df -h /var/lib/longhorn
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       155G   89G   67G  57% /
The calculation doesn't really take into account how much space LONGHORN is consuming. Just how much space on the disk is used in general. So increased utilization by something outside of Longhorn counts against usable space.
Of course, it is possible that Longhorn is the one using a huge amount of disk space. It could, for example, be creating a bunch of unnecessary replicas of the data on the disk. However, it's hard to understand how draining the node would fix the problem in that case. Draining the node would move all workloads (and Longhorn engines) off of it, but there's no particular reason it would lead to a decrease in Longhorn data consumption. (If Longhorn created a bunch of unnecessary replicas, the drain shouldn't lead to them being removed.) If this situation arises again: • Can we check the consumption of space within the
/var/lib/longhorn
directory on the offending node versus another to confirm the space usage is actually Longhorn's? (In fact, you could do that now on all nodes to see if the Longhorn usage is actually different.) • Can we grab a support bundle so we can see how Longhorn things space it being used at the moment?
Copy code
root@eweber-v126-worker-9c1451b4-rw5hf:~# du -h -s /var/lib/longhorn
180M    /var/lib/longhorn

root@eweber-v126-worker-9c1451b4-6464j:~# du -h -s /var/lib/longhorn
180M    /var/lib/longhorn

root@eweber-v126-worker-9c1451b4-kgxdq:~# du -h -s /var/lib/longhorn
181M    /var/lib/longhorn
e
Hi, yes it actually not longhorn itself as it turns out... it is /var/lib/docker/containers that is consuming all space on one node, happened again this morning... I don´t really know why this is happening... /var/lib/longhorn on that node is 814MB, /var/lib/docker/containers is 6.3 TB..:(
I found the issue, it was due to docker logging, I had not configured any logrotate for the dockerdaemon on the cluster nodes, so one container was filling up the disk with logs.
👍 1
f
Good to hear you found a resolution! If you dug into it, was it Longhorn containers contributing TB of logs or other containers? We are always interested to learn about any unexpected logging behavior from Longhorn.
e
No it was an instance of Kafka that a user had misconfigured with way too much logging, for a few hours this morning it genereated ~600GB of logs..:)