This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

09/21/2023, 3:25 AM

This message was deleted.

future-fountain-82544

09/21/2023, 3:30 AM

This was in one of the instance managers

Untitled

future-fountain-82544

09/21/2023, 3:32 AM

This seemed a little suspect, but I'm honestly not sure

Copy code

[pvc-a52f5d04-76e6-4058-abbc-34d8b20331da-e-2565aa66] go-iscsi-helper: tgtd is already running

future-fountain-82544

09/21/2023, 3:35 AM

I also noticed this on the underlying system...

Copy code

# audit2why < /var/log/audit/audit.log
type=AVC msg=audit(1695262967.474:1524): avc:  denied  { read } for  pid=48846 comm="iscsiadm" path="/dev/null" dev="tmpfs" ino=72 scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:object_r:container_runtime_tmpfs_t:s0 tclass=chr_file permissive=0

	Was caused by:
		Missing type enforcement (TE) allow rule.

		You can use audit2allow to generate a loadable module to allow this access.

which would translate to this

Copy code

# audit2allow < /var/log/audit/audit.log


#============= iscsid_t ==============
allow iscsid_t container_runtime_tmpfs_t:chr_file read;

future-fountain-82544

09/21/2023, 3:35 AM

Again, not sure if that's a red herring

future-fountain-82544

09/21/2023, 3:43 AM

Looking at the timestamp, it looks like that issue happened during system provisioning and hasn't occurred since

future-fountain-82544

09/21/2023, 4:03 AM

This is what I'm seeing in the volume event logs...

future-fountain-82544

09/21/2023, 4:03 AM

I'll poke at it tomorrow, but if anyone has any ideas where else I should look, please let me know

creamy-pencil-82913

09/21/2023, 5:17 AM

You might try with 1.27 or 1.26, I don't know if LH has been tested on 1.28 yet.

future-fountain-82544

09/21/2023, 1:35 PM

I'll probably give that a try. Before I tear the thing down, is there someone I can send a support bundle to to help with testing/supporting 1.28?

future-fountain-82544

09/21/2023, 2:45 PM

I rebuilt the cluster from scratch using k3s v1.27.6+k3s1 and am running into the same / similar problem. Any ideas on what to check next?

future-fountain-82544

09/21/2023, 2:49 PM

The pod will get created, it'll get scheduled, and then it'll fail and then StatefulSet will delete & recreate the pod. The PVC shows as "bound" Pod status:

Copy code

Events:
  Type     Reason              Age                From                     Message
  ----     ------              ----               ----                     -------
  Normal   Scheduled           51s                default-scheduler        Successfully assigned logging-system/logging-system-fluentd-0 to <http://kdevdev2-compute1.macc.ns.internet2.edu|kdevdev2-compute1.macc.ns.internet2.edu>
  Warning  FailedAttachVolume  42s (x5 over 52s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-bf64b67b-c1ed-4f45-a408-21412a8503f6" : rpc error: code = DeadlineExceeded desc = vol
ume pvc-bf64b67b-c1ed-4f45-a408-21412a8503f6 failed to attach to node <http://kdevdev2-compute1.macc.ns.internet2.edu|kdevdev2-compute1.macc.ns.internet2.edu> with attachmentID csi-ccc1ab4902965eeeb91a846c3604db8461757657f681720a5009ccd97fbfe592

PVC Describe:

Copy code

Name:          logging-system-fluentd-buffer-logging-system-fluentd-0
Namespace:     logging-system
StorageClass:  longhorn
Status:        Bound
Volume:        pvc-bf64b67b-c1ed-4f45-a408-21412a8503f6
Labels:        <http://app.kubernetes.io/component=fluentd|app.kubernetes.io/component=fluentd>
               <http://app.kubernetes.io/managed-by=logging-system|app.kubernetes.io/managed-by=logging-system>
               <http://app.kubernetes.io/name=fluentd|app.kubernetes.io/name=fluentd>
Annotations:   <http://pv.kubernetes.io/bind-completed|pv.kubernetes.io/bind-completed>: yes
               <http://pv.kubernetes.io/bound-by-controller|pv.kubernetes.io/bound-by-controller>: yes
               <http://volume.beta.kubernetes.io/storage-provisioner|volume.beta.kubernetes.io/storage-provisioner>: <http://driver.longhorn.io|driver.longhorn.io>
               <http://volume.kubernetes.io/storage-provisioner|volume.kubernetes.io/storage-provisioner>: <http://driver.longhorn.io|driver.longhorn.io>
Finalizers:    [<http://kubernetes.io/pvc-protection|kubernetes.io/pvc-protection>]
Capacity:      20Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       logging-system-fluentd-0
Events:
  Type    Reason                 Age                From                                                                                      Message
  ----    ------                 ----               ----                                                                                      -------
  Normal  ExternalProvisioning   10m (x2 over 10m)  persistentvolume-controller                                                               waiting for a volume to be created, either by external provis
ioner "<http://driver.longhorn.io|driver.longhorn.io>" or manually created by system administrator
  Normal  Provisioning           10m                driver.longhorn.io_csi-provisioner-65cb5cc4ff-k2xmk_90d4fb49-de20-4aab-9a8f-86a1c1af5514  External provisioner is provisioning volume for claim "loggin
g-system/logging-system-fluentd-buffer-logging-system-fluentd-0"
  Normal  ProvisioningSucceeded  10m                driver.longhorn.io_csi-provisioner-65cb5cc4ff-k2xmk_90d4fb49-de20-4aab-9a8f-86a1c1af5514  Successfully provisioned volume pvc-bf64b67b-c1ed-4f45-a408-2
1412a8503f6

future-fountain-82544

09/21/2023, 3:45 PM

I also created my own Deployment/Pod with my own PVC and its having the same issue, so I don't think it's an issue specific to the logging operator. Never the less, i can't figure out what's going wrong here

future-fountain-82544

09/21/2023, 4:41 PM

I did a fresh install with k3s v1.27.6+k3s1 and Longhorn v1.4.3 and it's still exhibiting the same behavior

future-fountain-82544

09/21/2023, 4:41 PM

I'm kind of at a loss here

creamy-pencil-82913

09/21/2023, 4:42 PM

it sounds like you’re running this on selinux-enabled nodes?

future-fountain-82544

09/21/2023, 4:42 PM

I am...

creamy-pencil-82913

09/21/2023, 4:43 PM

have you seen https://github.com/longhorn/longhorn/issues/5627

future-fountain-82544

09/21/2023, 4:45 PM

I have not, is there a "quick fix"? or is it just to turn off SELinux? 😆

future-fountain-82544

09/21/2023, 4:54 PM

F*... That was it, disabling SELinux worked

creamy-pencil-82913

09/21/2023, 5:04 PM

there is some discussion of policies in that issue, I’m not sure where it ended up

future-fountain-82544

09/21/2023, 5:05 PM

There was a work-around that @eager-orange-89771 posted near hte end of that ticket. https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/prerequisite/longhorn-iscsi-selinux-workaround.yaml

future-fountain-82544

09/21/2023, 5:06 PM

I hate disabling SELinux to get around issues, but I'm flipping it into permissive mode for now

Open in Slack

Previous Next