This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

01/05/2024, 9:09 PM

This message was deleted.

dazzling-oyster-19304

01/05/2024, 9:10 PM

I would appreciate replies in a thread so I get notified.

hundreds-evening-84071

01/05/2024, 10:16 PM

We are also RHEL shop and have relied on the SuSE support matrix before upgrading RHEL versions. I do not see RHEL 8.9 as officially supported yet. https://www.suse.com/suse-k3s/support-matrix/all-supported-versions/k3s-v1-28/

hundreds-evening-84071

01/05/2024, 10:17 PM

In addition - you are better off with RHEL 8.8 because it has EUS but 8.9 does not; reference: https://access.redhat.com/support/policy/updates/errata

dazzling-oyster-19304

01/05/2024, 10:17 PM

😲 Good catch. I will check the Rancher support matrix, because an RKE cluster is what original broke.

dazzling-oyster-19304

01/05/2024, 10:19 PM

I see that Rancher also does not support 8.9 (https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-7-9/)

➕ 1

dazzling-oyster-19304

01/06/2024, 2:50 PM

@hundreds-evening-84071 I forgot to thank you for taking the time to help me troubleshoot. Thanks! We are rebuilding one of the nodes to RHEL EUS 8.8 to test the hypothesis.

hundreds-evening-84071

01/08/2024, 12:32 PM

No problem at all; good luck!

dazzling-oyster-19304

01/08/2024, 12:33 PM

That is a crazy coincidence of timing. I literally just got access to the rebuilt 8.8 node and started installing k3s on it two minutes ago 🙂

🤞 1

dazzling-oyster-19304

01/08/2024, 1:27 PM

Exact same behavior on the freshly provisioned node ☹️

Copy code

# hostnamectl
    ...
    Virtualization: vmware
  Operating System: Red Hat Enterprise Linux 8.8 (Ootpa)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:8::baseos
            Kernel: Linux 4.18.0-477.36.1.el8_8.x86_64
      Architecture: x86-64

hundreds-evening-84071

01/08/2024, 3:09 PM

are you specifying k3s version?

dazzling-oyster-19304

01/08/2024, 3:09 PM

No, letting it default to 1.28

dazzling-oyster-19304

01/08/2024, 3:10 PM

oh, maybe that's too high...

hundreds-evening-84071

01/08/2024, 3:11 PM

maybe try this one?

v1.27.9+k3s1

✅ 1

hundreds-evening-84071

01/08/2024, 3:11 PM

curl -sfL <https://get.k3s.io> | INSTALL_K3S_VERSION="v1.27.9+k3s1"

hundreds-evening-84071

01/08/2024, 3:11 PM

https://github.com/k3s-io/k3s/releases

dazzling-oyster-19304

01/08/2024, 3:12 PM

originally I tried installing with a Rancher instance that we use for other clusters and indeed the original cluster that failed after the OS update, but in that case I believe I also attempted v1.28.

hundreds-evening-84071

01/08/2024, 3:14 PM

I do not use k3s anymore but I believe there is

journactl -u k3s-server.service

you can view the log. or maybe its

journalctl -u k3s.service

dazzling-oyster-19304

01/08/2024, 4:15 PM

I tried 1.27 and 1.26 with the same behavior. Digging into the journalctl output (k3s v1.26) I see this message that is consistent with the failing pod logs:

Copy code

E0108 09:52:01.408964    2354 controller.go:156] Unable to perform initial Kubernetes service initialization: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.43.0.1"}: failed to allocate IP 10.43.0.1: cannot allocate resources of type serviceipallocations at this time

I get the same failure when I customize the service-cidr with

10.44.0.0/16

for example

rough-farmer-49135

01/08/2024, 4:17 PM

Your config states you're disabling traefik & servicelb. What did you install in their place?

dazzling-oyster-19304

01/08/2024, 4:18 PM

Funny you mentioned that. I was just about to try enabling the servicelb. Thus far I haven't yet installed anything in their place, because typically I first install ArgoCD and then sync a root app that includes Traefik and MetalLB

dazzling-oyster-19304

01/08/2024, 4:18 PM

On my other clusters, an ingress controller and loadbalancer are not necessary for basic operation

rough-farmer-49135

01/08/2024, 4:20 PM

I'm more familiar with RKE2 and generally only use K3S through K3D, but what's your network plugin? That's what should be assigning your pod IPs. For RKE2 that's canal or calico by default (based on if you install from script or Rancher).

dazzling-oyster-19304

01/08/2024, 4:20 PM

k3s installs Flannel CNI with VXLAN backend by default

rough-farmer-49135

01/08/2024, 4:21 PM

I think that still has a pod. Do you see anything in the Flannel logs?

dazzling-oyster-19304

01/08/2024, 4:21 PM

I tried Calico and Flannel with gw-host backend a few days ago with no change in behavior

dazzling-oyster-19304

01/08/2024, 4:22 PM

there are only three pods running after installation:

Copy code

$ kubectl get pod -A 
NAMESPACE     NAME                                      READY   STATUS    RESTARTS      AGE
kube-system   coredns-59b4f5bbd5-tfxs6                  0/1     Running   0             2m32s
kube-system   local-path-provisioner-76d776f6f9-nn8sf   1/1     Running   3 (42s ago)   2m32s
kube-system   metrics-server-68cf49699b-xj98p           0/1     Running   3 (37s ago)   2m32s

dazzling-oyster-19304

01/08/2024, 4:23 PM

This does seems suspicious...

rough-farmer-49135

01/08/2024, 4:23 PM

For RKE2 I don't recall if they're in the normal pods or the static pods.

rough-farmer-49135

01/08/2024, 4:23 PM

(so you'd have to check with crictl to get the logs)

dazzling-oyster-19304

01/08/2024, 4:24 PM

Copy code

[root@desapps3 ~]# crictl ps -a
CONTAINER           IMAGE               CREATED              STATE               NAME                     ATTEMPT             POD ID              POD
1f9ef6534209c       817bbe3f2e517       23 seconds ago       Running             metrics-server           4                   3ee7bec97a68d       metrics-server-68cf49699b-xj98p
83235bcf29ca0       b29384aeb4b13       25 seconds ago       Running             local-path-provisioner   4                   a7e4d5b3ce5e6       local-path-provisioner-76d776f6f9-nn8sf
d811b548cec97       817bbe3f2e517       About a minute ago   Exited              metrics-server           3                   3ee7bec97a68d       metrics-server-68cf49699b-xj98p
a2844d2e176ac       b29384aeb4b13       About a minute ago   Exited              local-path-provisioner   3                   a7e4d5b3ce5e6       local-path-provisioner-76d776f6f9-nn8sf
0f375e0292ee8       ead0a4a53df89       4 minutes ago        Running             coredns                  0                  2d2632d6282f9       coredns-59b4f5bbd5-tfxs6

dazzling-oyster-19304

01/08/2024, 4:25 PM

this is a single-node cluster. I don't see anything via

ps aux

either.

rough-farmer-49135

01/08/2024, 4:27 PM

I wonder where flannel runs then. Must be baked into k3s binary I guess? Maybe there are logging verbosity options for k3s server you could try?

dazzling-oyster-19304

01/08/2024, 4:28 PM

Copy code

[root@desapps3 ~]# journalctl -u k3s.service | grep flannel 
time="2024-01-08T10:19:36-06:00" level=info msg="Starting flannel with backend vxlan"
time="2024-01-08T10:19:49-06:00" level=info msg="The interface ens160 with ipv4 address 141.142.161.58 will be used by flannel"
time="2024-01-08T10:19:50-06:00" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
time="2024-01-08T10:19:50-06:00" level=info msg="Running flannel backend."

[root@desapps3 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

dazzling-oyster-19304

01/08/2024, 4:30 PM

I am not easily discovering the binary actually running Flannel

rough-farmer-49135

01/08/2024, 4:30 PM

Yeah, I know flannel just sets interfaces and uses routing rules and is fairly simple in that vein, but I'd think it'd need something for assigning IPs, which, as I understand, is done as part of the CNI. Since your error is for assigning IPs for services, I'm not sure where that'd happen. Not to mention that what you're looking at is only mentioning the pod network and not the service network (which was your error).

hundreds-evening-84071

01/08/2024, 4:36 PM

I am curious; have you tried to start/deploy k3s without any of your custom entries in k3s.conf

dazzling-oyster-19304

01/08/2024, 4:37 PM

No, but that seems like an obvious and smart thing to do 😆 . I'll try that now.

dazzling-oyster-19304

01/08/2024, 4:38 PM

by the way, I see that there is a flannel.1 interface with 10.42.0.0, but I never see a service CIDR related interface. Is the 10.43.0.0 network completely virtual or something, where requests to 10.43.x.x are routed through the actual 10.42.x.x space?

dazzling-oyster-19304

01/08/2024, 4:44 PM

I am curious; have you tried to start/deploy k3s without any of your custom entries in k3s.conf

Now there are 5 pods failing instead of 3, because the traefik helm install is stuck at

Copy code

+ helm_v3 install --set-string global.systemDefaultRegistry= traefik <https://10.43.0.1:443/static/charts/traefik-25.0.2+up25.0.0.tgz> --values /config/values-01_HelmChart.yaml

hundreds-evening-84071

01/08/2024, 4:51 PM

I just did following on my sandbox VM RHEL8.8:

Copy code

curl -sfL <https://get.k3s.io> | INSTALL_K3S_VERSION="v1.27.9+k3s1" INSTALL_K3S_EXEC="server --cluster-init" sh -s -

hundreds-evening-84071

01/08/2024, 4:51 PM

k3s service is already running and kubectl shows the node in Ready

hundreds-evening-84071

01/08/2024, 4:51 PM

I did not have anything in config file

hundreds-evening-84071

01/08/2024, 4:52 PM

this is strange why its not working for you

dazzling-oyster-19304

01/08/2024, 4:52 PM

I also see that the node is Ready, but the pods cannot reach ready state because of the connection errors like coredns:

Copy code

[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "<https://10.43.0.1:443/version>": dial tcp 10.43.0.1:443: i/o timeout

hundreds-evening-84071

01/08/2024, 4:53 PM

Copy code

# kubectl get po -n kube-system
NAME                                     READY   STATUS      RESTARTS   AGE
coredns-77ccd57875-5zjzw                 1/1     Running     0          2m42s
helm-install-traefik-crd-qwjm9           0/1     Completed   0          2m43s
helm-install-traefik-q45x5               0/1     Completed   1          2m43s
local-path-provisioner-957fdf8bc-l96k9   1/1     Running     0          2m42s
metrics-server-648b5df564-dn98m          1/1     Running     0          2m42s
svclb-traefik-94d49914-75wns             2/2     Running     0          2m33s
traefik-768bdcdcdd-zqt2k                 1/1     Running     0          2m33s

rough-farmer-49135

01/08/2024, 4:53 PM

Yeah, a lot of things will fail if the internal IP for base Kubernetes fails to register and work.

hundreds-evening-84071

01/08/2024, 4:54 PM

I'll say double check with the requirements (if you have not done it already): https://docs.k3s.io/installation/requirements

dazzling-oyster-19304

01/08/2024, 4:55 PM

I feel like we've circled in on the root of the problem. I will carefully review the requirements again. I really appreciate y'all taking your time to help me troubleshoot.

👍 1

43 Views

Open in Slack

Previous Next