This message was deleted.
# k3s
a
This message was deleted.
d
I would appreciate replies in a thread so I get notified.
h
We are also RHEL shop and have relied on the SuSE support matrix before upgrading RHEL versions. I do not see RHEL 8.9 as officially supported yet. https://www.suse.com/suse-k3s/support-matrix/all-supported-versions/k3s-v1-28/
In addition - you are better off with RHEL 8.8 because it has EUS but 8.9 does not; reference: https://access.redhat.com/support/policy/updates/errata
d
๐Ÿ˜ฒ Good catch. I will check the Rancher support matrix, because an RKE cluster is what original broke.
@hundreds-evening-84071 I forgot to thank you for taking the time to help me troubleshoot. Thanks! We are rebuilding one of the nodes to RHEL EUS 8.8 to test the hypothesis.
h
No problem at all; good luck!
d
That is a crazy coincidence of timing. I literally just got access to the rebuilt 8.8 node and started installing k3s on it two minutes ago ๐Ÿ™‚
๐Ÿคž 1
Exact same behavior on the freshly provisioned node โ˜น๏ธ
Copy code
# hostnamectl
    ...
    Virtualization: vmware
  Operating System: Red Hat Enterprise Linux 8.8 (Ootpa)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:8::baseos
            Kernel: Linux 4.18.0-477.36.1.el8_8.x86_64
      Architecture: x86-64
h
are you specifying k3s version?
d
No, letting it default to 1.28
oh, maybe that's too high...
h
maybe try this one?
v1.27.9+k3s1
โœ… 1
curl -sfL <https://get.k3s.io> | INSTALL_K3S_VERSION="v1.27.9+k3s1"
d
originally I tried installing with a Rancher instance that we use for other clusters and indeed the original cluster that failed after the OS update, but in that case I believe I also attempted v1.28.
h
I do not use k3s anymore but I believe there is
journactl -u k3s-server.service
you can view the log. or maybe its
journalctl -u k3s.service
d
I tried 1.27 and 1.26 with the same behavior. Digging into the journalctl output (k3s v1.26) I see this message that is consistent with the failing pod logs:
Copy code
E0108 09:52:01.408964    2354 controller.go:156] Unable to perform initial Kubernetes service initialization: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.43.0.1"}: failed to allocate IP 10.43.0.1: cannot allocate resources of type serviceipallocations at this time
I get the same failure when I customize the service-cidr with
10.44.0.0/16
for example
r
Your config states you're disabling traefik & servicelb. What did you install in their place?
d
Funny you mentioned that. I was just about to try enabling the servicelb. Thus far I haven't yet installed anything in their place, because typically I first install ArgoCD and then sync a root app that includes Traefik and MetalLB
On my other clusters, an ingress controller and loadbalancer are not necessary for basic operation
r
I'm more familiar with RKE2 and generally only use K3S through K3D, but what's your network plugin? That's what should be assigning your pod IPs. For RKE2 that's canal or calico by default (based on if you install from script or Rancher).
d
k3s installs Flannel CNI with VXLAN backend by default
r
I think that still has a pod. Do you see anything in the Flannel logs?
d
I tried Calico and Flannel with gw-host backend a few days ago with no change in behavior
there are only three pods running after installation:
Copy code
$ kubectl get pod -A 
NAMESPACE     NAME                                      READY   STATUS    RESTARTS      AGE
kube-system   coredns-59b4f5bbd5-tfxs6                  0/1     Running   0             2m32s
kube-system   local-path-provisioner-76d776f6f9-nn8sf   1/1     Running   3 (42s ago)   2m32s
kube-system   metrics-server-68cf49699b-xj98p           0/1     Running   3 (37s ago)   2m32s
This does seems suspicious...
r
For RKE2 I don't recall if they're in the normal pods or the static pods.
(so you'd have to check with crictl to get the logs)
d
Copy code
[root@desapps3 ~]# crictl ps -a
CONTAINER           IMAGE               CREATED              STATE               NAME                     ATTEMPT             POD ID              POD
1f9ef6534209c       817bbe3f2e517       23 seconds ago       Running             metrics-server           4                   3ee7bec97a68d       metrics-server-68cf49699b-xj98p
83235bcf29ca0       b29384aeb4b13       25 seconds ago       Running             local-path-provisioner   4                   a7e4d5b3ce5e6       local-path-provisioner-76d776f6f9-nn8sf
d811b548cec97       817bbe3f2e517       About a minute ago   Exited              metrics-server           3                   3ee7bec97a68d       metrics-server-68cf49699b-xj98p
a2844d2e176ac       b29384aeb4b13       About a minute ago   Exited              local-path-provisioner   3                   a7e4d5b3ce5e6       local-path-provisioner-76d776f6f9-nn8sf
0f375e0292ee8       ead0a4a53df89       4 minutes ago        Running             coredns                  0                  2d2632d6282f9       coredns-59b4f5bbd5-tfxs6
this is a single-node cluster. I don't see anything via
ps aux
either.
r
I wonder where flannel runs then. Must be baked into k3s binary I guess? Maybe there are logging verbosity options for k3s server you could try?
d
Copy code
[root@desapps3 ~]# journalctl -u k3s.service | grep flannel 
time="2024-01-08T10:19:36-06:00" level=info msg="Starting flannel with backend vxlan"
time="2024-01-08T10:19:49-06:00" level=info msg="The interface ens160 with ipv4 address 141.142.161.58 will be used by flannel"
time="2024-01-08T10:19:50-06:00" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
time="2024-01-08T10:19:50-06:00" level=info msg="Running flannel backend."

[root@desapps3 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
I am not easily discovering the binary actually running Flannel
r
Yeah, I know flannel just sets interfaces and uses routing rules and is fairly simple in that vein, but I'd think it'd need something for assigning IPs, which, as I understand, is done as part of the CNI. Since your error is for assigning IPs for services, I'm not sure where that'd happen. Not to mention that what you're looking at is only mentioning the pod network and not the service network (which was your error).
h
I am curious; have you tried to start/deploy k3s without any of your custom entries in k3s.conf
d
No, but that seems like an obvious and smart thing to do ๐Ÿ˜† . I'll try that now.
by the way, I see that there is a flannel.1 interface with 10.42.0.0, but I never see a service CIDR related interface. Is the 10.43.0.0 network completely virtual or something, where requests to 10.43.x.x are routed through the actual 10.42.x.x space?
I am curious; have you tried to start/deploy k3s without any of your custom entries in k3s.conf
Now there are 5 pods failing instead of 3, because the traefik helm install is stuck at
Copy code
+ helm_v3 install --set-string global.systemDefaultRegistry= traefik <https://10.43.0.1:443/static/charts/traefik-25.0.2+up25.0.0.tgz> --values /config/values-01_HelmChart.yaml
h
I just did following on my sandbox VM RHEL8.8:
Copy code
curl -sfL <https://get.k3s.io> | INSTALL_K3S_VERSION="v1.27.9+k3s1" INSTALL_K3S_EXEC="server --cluster-init" sh -s -
k3s service is already running and kubectl shows the node in Ready
I did not have anything in config file
this is strange why its not working for you
d
I also see that the node is Ready, but the pods cannot reach ready state because of the connection errors like coredns:
Copy code
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "<https://10.43.0.1:443/version>": dial tcp 10.43.0.1:443: i/o timeout
h
Copy code
# kubectl get po -n kube-system
NAME                                     READY   STATUS      RESTARTS   AGE
coredns-77ccd57875-5zjzw                 1/1     Running     0          2m42s
helm-install-traefik-crd-qwjm9           0/1     Completed   0          2m43s
helm-install-traefik-q45x5               0/1     Completed   1          2m43s
local-path-provisioner-957fdf8bc-l96k9   1/1     Running     0          2m42s
metrics-server-648b5df564-dn98m          1/1     Running     0          2m42s
svclb-traefik-94d49914-75wns             2/2     Running     0          2m33s
traefik-768bdcdcdd-zqt2k                 1/1     Running     0          2m33s
r
Yeah, a lot of things will fail if the internal IP for base Kubernetes fails to register and work.
h
I'll say double check with the requirements (if you have not done it already): https://docs.k3s.io/installation/requirements
d
I feel like we've circled in on the root of the problem. I will carefully review the requirements again. I really appreciate y'all taking your time to help me troubleshoot.
๐Ÿ‘ 1