https://rancher.com/ logo
Docs
Join the conversationJoin Slack
Channels
academy
amazon
arm
azure
cabpr
chinese
ci-cd
danish
deutsch
developer
elemental
epinio
espanol
events
extensions
fleet
français
gcp
general
harvester
harvester-dev
hobbyfarm
hypper
japanese
k3d
k3os
k3s
k3s-contributor
kim
kubernetes
kubewarden
lima
logging
longhorn-dev
longhorn-storage
masterclass
mesos
mexico
nederlands
neuvector-security
office-hours
one-point-x
onlinemeetup
onlinetraining
opni
os
ozt
phillydotnet
portugues
rancher-desktop
rancher-extensions
rancher-setup
rancher-wrangler
random
rfed_ara
rio
rke
rke2
russian
s3gw
service-mesh
storage
submariner
supermicro-sixsq
swarm
terraform-controller
terraform-provider-rancher2
terraform-provider-rke
theranchcast
training-0110
training-0124
training-0131
training-0207
training-0214
training-1220
ukranian
v16-v21-migration
vsphere
windows
Powered by Linen
longhorn-storage
  • b

    blue-painting-33432

    03/03/2023, 12:23 AM
    Hello, I'm having quite a urgent issue here with a basic problem, anyone online to help? Using longhorn on microk8s with local node storage, it worked fine for a few weeks, but now I restarted a pod and the volume just won't attach anymore, it timeout and restart the pod and keeps looping like this. In the longhorn UI I only see "attached" and "detached" events repeating all the time, but no error.
  • b

    blue-painting-33432

    03/03/2023, 12:23 AM
    Unable to attach or mount volumes: unmounted volumes=[alertmanager-kube-prometheus-stack-alertmanager-db], unattached volumes=[web-config kube-api-access-8lfhh config-volume config-out tls-assets alertmanager-kube-prometheus-stack-alertmanager-db]: timed out waiting for the condition
  • b

    blue-painting-33432

    03/03/2023, 12:24 AM
    any way to increase the timeout waiting for the volume mount or something?
  • b

    blue-painting-33432

    03/03/2023, 12:30 AM
    pod remain in pending and containercreating for a few seconds, then immediately goes to deleting and start again, always failing attaching the volume, but longhorn seems to say it is attached with no error
  • b

    blue-painting-33432

    03/03/2023, 1:05 AM
    Ok nevermind, I installed an helm chart that installed a bunch of namespaces and ressources I didn't want, including a second prometheus instance in another namespace, and uninstalling the helm release left those ressource in the cluster and seemed to be causing issue with my real prometheus instance volumes. What a mess
    f
    • 2
    • 1
  • a

    acceptable-soccer-28720

    03/03/2023, 2:04 AM
    Restart of the node and dettach/attach ofmthe volume did not hell to solve the situation
    rancher kubectl describe pod gitlab-postgresql-0 -n gitlab
    Events:
      Type     Reason              Age                  From                     Message
      ----     ------              ----                 ----                     -------
      Normal   Scheduled           14m                  default-scheduler        Successfully assigned gitlab/gitlab-postgresql-0 to vik8scases-w-2
      Warning  FailedMount         10m                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[kube-api-access-g28wp custom-init-scripts postgresql-password dshm data]: timed out waiting for the condition
      Warning  FailedMount         7m49s                kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[postgresql-password dshm data kube-api-access-g28wp custom-init-scripts]: timed out waiting for the condition
      Warning  FailedMount         5m34s                kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[dshm data kube-api-access-g28wp custom-init-scripts postgresql-password]: timed out waiting for the condition
      Warning  FailedMount         3m18s (x2 over 12m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-g28wp custom-init-scripts postgresql-password dshm]: timed out waiting for the condition
      Warning  FailedAttachVolume  2m9s (x6 over 12m)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b59dafa1-3efa-44fc-92ba-e2be23e5d4a4" : timed out waiting for external-attacher of <http://driver.longhorn.io|driver.longhorn.io> CSI driver to attach volume pvc-b59dafa1-3efa-44fc-92ba-e2be23e5d4a4
      Warning  FailedMount         64s                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[custom-init-scripts postgresql-password dshm data kube-api-access-g28wp]: timed out waiting for the condition
    f
    • 2
    • 4
  • p

    plain-breakfast-5576

    03/03/2023, 6:02 PM
    @famous-journalist-11332 we are seeing deadlock while upgrading from 1.2.2. To 1.2.3. Do u have any Upton this? If not can you plz update the kb Longhorn.io/troubleshooting-engine-upgrading-struck-in-deadlock
  • r

    rich-shoe-36510

    03/08/2023, 11:03 AM
    Hey people! One simple question, for which I can't find simple answer. I have running Longhorn (v1.4.0) with one hard drive, everything is fine. Now, I added hard drive, the question is how to add additional disk to the node? Can it be done trough UI, how to bind-mount two hdd mounts to /var/lib/longhorn? Thanks!
  • r

    rich-shoe-36510

    03/08/2023, 11:14 AM
    My bad, UI has scroll down, which is not that visible. Add disk is there!
    f
    • 2
    • 8
  • a

    able-library-94454

    03/09/2023, 4:08 AM
    I'm reading the longhorn documentation and looking over the CSI Persistent Volume example. I see the spec.csi.volumeHandle parameter as pointing to an already created longhorn volume and trying to understand the usage of the csi section. What I've been trying to find is a reference of the options that are possible under the spec.csi section - I figured there must be a reference somewhere but can't find one, just examples but not a formal spec. I've even tried reading the Go code and haven't found where that section is parsed. Can someone steer me in the direction of a definitive reference for this section?
    c
    f
    • 3
    • 17
  • a

    able-library-94454

    03/09/2023, 5:31 AM
    I just found this discussion which is I think what I've been struggling to understand: https://github.com/longhorn/longhorn/discussions/2190
  • b

    big-judge-33880

    03/09/2023, 9:18 AM
    Is there anything in longhorn that runs in a strict 30s loop? Getting these strange errors across a cluster running longhorn volumes, every 30 seconds on each node, but I have no workloads that fit such a pattern, and these errors don’t surface to any running workloads as far as I can tell (I have a rancher prometheus volume that’s stuck in a perpetual attach/detach loop, but that’s much more frequent.. it keeps looping even when the statefulset is scaled to 0):
    [Thu Mar  9 09:13:02 2023] Buffer I/O error on dev dm-2, logical block 2441200, async page read
    [Thu Mar  9 09:13:32 2023] buffer_io_error: 23 callbacks suppressed
    [Thu Mar  9 09:13:32 2023] Buffer I/O error on dev dm-0, logical block 25837552, async page read
    f
    • 2
    • 12
  • c

    crooked-cat-21365

    03/13/2023, 7:21 AM
    I have seen it several times that one of my nodes got stuck with a bazillion of messages like
    Mar 12 00:00:02 srvl062 kernel: [713683.338101] nfs: server 10.43.170.7 not responding, timed out
    Mar 12 00:00:03 srvl062 kernel: [713683.914078] nfs: server 10.43.187.251 not responding, timed out
    Mar 12 00:00:03 srvl062 kernel: [713684.170058] nfs: server 10.43.28.105 not responding, timed out
    Mar 12 00:00:05 srvl062 kernel: [713685.453982] nfs: server 10.43.86.90 not responding, timed out
    Mar 12 00:00:05 srvl062 kernel: [713685.610363] nfs: server 10.43.133.91 not responding, timed out
    Mar 12 00:00:05 srvl062 kernel: [713686.217960] nfs: server 10.43.28.71 not responding, timed out
    Mar 12 00:00:06 srvl062 kernel: [713687.245956] nfs: server 10.43.43.8 not responding, timed out
    Mar 12 00:00:08 srvl062 kernel: [713689.289855] nfs: server 10.43.131.82 not responding, timed out
    Mar 12 00:00:10 srvl062 kernel: [713690.501785] nfs: server 10.43.170.7 not responding, timed out
    Mar 12 00:00:10 srvl062 kernel: [713690.569795] nfs: server 10.43.86.90 not responding, timed out
    Mar 12 00:00:13 srvl062 kernel: [713693.385742] nfs: server 10.43.187.251 not responding, timed out
    Mar 12 00:00:13 srvl062 kernel: [713694.185659] nfs: server 10.43.43.8 not responding, timed out
    Mar 12 00:00:14 srvl062 kernel: [713695.013664] nfs: server 10.43.28.105 not responding, timed out
    Mar 12 00:00:15 srvl062 kernel: [713695.693620] nfs: server 10.43.86.90 not responding, timed out
    Mar 12 00:00:16 srvl062 kernel: [713696.521595] nfs: server 10.43.133.91 not responding, timed out
    Mar 12 00:00:20 srvl062 kernel: [713700.557466] nfs: server 10.43.170.165 not responding, timed out
    Mar 12 00:00:20 srvl062 kernel: [713700.617405] nfs: server 10.43.86.90 not responding, timed out
    Mar 12 00:00:20 srvl062 kernel: [713700.813461] nfs: server 10.43.131.82 not responding, timed out
    Mar 12 00:00:21 srvl062 kernel: [713701.641479] nfs: server 10.43.187.251 not responding, timed out
    Mar 12 00:00:22 srvl062 kernel: [713703.241337] nfs: server 10.43.70.182 not responding, timed out
    Mar 12 00:00:24 srvl062 kernel: [713704.717343] nfs: server 10.43.170.7 not responding, timed out
    Mar 12 00:00:25 srvl062 kernel: [713705.485281] nfs: server 10.43.202.97 not responding, timed out
    What is the recommended procedure to get out of this?
  • a

    aloof-hair-13897

    03/13/2023, 1:25 PM
    set the channel topic: The latest Longhorn release is v1.4.1. Community event at: https://community.cncf.io/cncf-longhorn-project/
    👍 1
  • b

    bland-painting-61617

    03/13/2023, 3:50 PM
    @aloof-hair-13897 do you happen to know why longhorn 1.4.0 never made it to rancher marketplace? If not, that would be my #1 topic to mention on the community call. We're already on 1.4.1 and my expectations of seeing it in Rancher are quite low now... 😂 I was thinking of installing the Helm version but it's too late now...
    a
    • 2
    • 7
  • c

    crooked-cat-21365

    03/14/2023, 8:42 AM
    Is there some way to disable NFS support in Longhorn to improve stability?
    i
    • 2
    • 16
  • b

    busy-judge-4614

    03/15/2023, 6:40 PM
    Hey, I don’t suppose there is any unsupported way to use an NFS v3 server in Longhorn. Damn Mac OS only has v3 server (along with SMB and AFS) and thats where my big disks are shared. This is only for home/tinkering use!
    ✅ 1
    • 1
    • 1
  • b

    bland-painting-61617

    03/15/2023, 8:07 PM
    I wanted to flag that I think this is not a good design.
    failed to set the setting priority-class with invalid value rancher-critical: cannot modify priority class setting before all volumes are detached
    The change should be put in place and applied when possible, without causing downtime. There is no easy way to detach all volumes without scaling all workload to 0, which is painful. Maybe a kill/maintenance switch in longhorn can be implemented to facilitate such changes in an easier manner.
    i
    • 2
    • 1
  • c

    creamy-pencil-82913

    03/15/2023, 9:28 PM
    how would you suggest to safely detach volumes out from under a running workload?
    b
    i
    +2
    • 5
    • 5
  • n

    numerous-lighter-90852

    03/17/2023, 2:56 PM
    @numerous-lighter-90852 has left the channel
  • l

    late-needle-80860

    03/17/2023, 3:12 PM
    So I, silly me, did not follow https://longhorn.io/docs/1.4.1/volumes-and-nodes/maintenance/#updating-the-node-os-or-container-runtime when I removed a node by first draining it … from a k3s v1.25.5 cluster. I can create a simple pod that consumes a pvc and that gets up and running and I can exec into the pod and write data to a mount on the created pvc. But, two of the volumes on the cluster - NOT RWX - cannot get attached. I’ve hard deleted them but they end up getting deadline exceeded errors after some time and thereby fail to attach. Tailing all pods in the longhorn-system namespace, with the tail krew plugin, I see that the removed node is mentioned with a log line stating that the engine image is not ready on the removed worker. And of course not - because the worker is gone. How can the above situation be remediated? This issue > https://github.com/longhorn/longhorn/issues/5408 is somewhat similar. Thank you very much
    • 1
    • 3
  • l

    late-needle-80860

    03/17/2023, 3:32 PM
    Looking at the longhorn volume crd’s representing the pvc’s that cannot be attached and the Old removed worker node is listed in the pendingNodeID field
  • l

    late-needle-80860

    03/17/2023, 4:03 PM
    Hmm a restore of a volume via the backup panel in the longhorn ui made things clear up …. Really need to ensure I follow the steps needed when removing a node that’s a “longhorn” node in the cluster ….
  • l

    late-needle-80860

    03/17/2023, 4:08 PM
    Still interested in knowing if there’s a better than restoring in the situation that happened here ( besides following the guide ) - as my impression is that these attach issues can come from a plethora of different “places”
  • r

    red-printer-88208

    03/18/2023, 4:49 AM
    I am always seeing this issue with longhorn admission webhook. The current version I am on is 1.3.2 Does anyone faced this issue before or got any ideas why this might be happening. Thanks.
    c
    • 2
    • 3
  • l

    late-needle-80860

    03/18/2023, 6:36 PM
    Interesting longhorn replica eviction behavior I bumped into. So: • disabled scheduling on all disks on a longhorn node • Set eviction to true on the disks on the same node As there’s no other node to actually “take on” the evicted replicas nothing happens. I enable disk scheduling again. However, without setting eviction requested to false on all the disks on the node. The replicas on the node gets doubled. I know I may be fooling around a bit here. But, still - isn’t this unintended behavior? Thank you
  • l

    late-needle-80860

    03/18/2023, 6:37 PM
    Another question. Isn’t the effect of disabling scheduling at the node level. The same as compared to disabling scheduling on all disks on the same node?
  • l

    late-needle-80860

    03/18/2023, 8:26 PM
    Feel VERY free to chime in on this discussion > https://github.com/longhorn/longhorn/discussions/5593 < 😄 thank you!
  • b

    busy-judge-4614

    03/19/2023, 12:34 PM
    Hi, is it possible to assign a persistent storage volume to an already running ‘app’, using Rancher/k3s and Longhorn?
    l
    • 2
    • 15
  • b

    bitter-tailor-6977

    03/22/2023, 5:43 AM
    hi All, i have 10 node cluster and by using longhorn created the dynamic storage ….. to check the resilience bit i tried to down two nodes and checked the replica is creating in another node…. but still it is not creating replicas in other available nodes ?
    i
    • 2
    • 1
Powered by Linen
Title
b

bitter-tailor-6977

03/22/2023, 5:43 AM
hi All, i have 10 node cluster and by using longhorn created the dynamic storage ….. to check the resilience bit i tried to down two nodes and checked the replica is creating in another node…. but still it is not creating replicas in other available nodes ?
i

icy-agency-38675

03/22/2023, 1:35 PM
Can you check the information in the longhorn-manager log?
View count: 1