This message was deleted Rancher Users #rke

Join Slack

This message was deleted.

# rke

adamant-kite-43734

05/12/2023, 10:47 AM

This message was deleted.

Untitled

quiet-potato-9276

05/12/2023, 10:49 AM

I've been following this guide: https://computingforgeeks.com/install-kubernetes-production-cluster-using-rancher-rke/

great-jewelry-76121

05/12/2023, 10:59 AM

I have issues with coredns not running which I think is caused by calico-kube-controllers in crash loop back off.

It won't be -

calico-kube-controllers

doesn't do anything which affects the dataplane for other pods. Its essentially a garbage collector and label updater. Can you do a quick check for me - can pods talk to each other? On the same node? On different nodes?

quiet-potato-9276

05/12/2023, 11:05 AM

I can ping between busybox1 and busybox2 on different nodes:

Copy code

[al@rkemaster01 ~]$ kubectl exec -ti busybox2 -- /bin/sh
E0512 12:03:20.136051   25714 memcache.go:287] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0512 12:03:20.138957   25714 memcache.go:121] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0512 12:03:20.143424   25714 memcache.go:121] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
/ #
/ # ping 10.42.3.7
PING 10.42.3.7 (10.42.3.7): 56 data bytes
64 bytes from 10.42.3.7: seq=0 ttl=62 time=1.524 ms
64 bytes from 10.42.3.7: seq=1 ttl=62 time=0.766 ms
^C
--- 10.42.3.7 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.766/1.145/1.524 ms

quiet-potato-9276

05/12/2023, 11:06 AM

I've already started on upgrading the kernel in Centos to 5.x

quiet-potato-9276

05/12/2023, 11:24 AM

I upgrade the kernel, and though the warning has gone there is no change:

Copy code

[al@rkemaster01 ~]$ kubectl get pods -A
E0512 12:23:31.447388    4839 memcache.go:287] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0512 12:23:31.473213    4839 memcache.go:121] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0512 12:23:31.477644    4839 memcache.go:121] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0512 12:23:31.481753    4839 memcache.go:121] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
NAMESPACE       NAME                                      READY   STATUS             RESTARTS         AGE
default         busybox1                                  1/1     Running            2 (119s ago)     22m
default         busybox2                                  1/1     Running            1 (2m24s ago)    22m
default         nginx                                     1/1     Running            3 (104s ago)     72m
ingress-nginx   ingress-nginx-admission-create-p8z4t      0/1     Completed          0                73m
ingress-nginx   nginx-ingress-controller-kqczp            0/1     Running            30 (1s ago)      73m
ingress-nginx   nginx-ingress-controller-mdlxg            0/1     Running            30 (23s ago)     73m
ingress-nginx   nginx-ingress-controller-ms44h            0/1     Running            31 (2m36s ago)   73m
kube-system     calico-kube-controllers-85d56898c-swvqw   0/1     Running            30 (20s ago)     74m
kube-system     canal-h5lcp                               2/2     Running            6 (2m9s ago)     74m
kube-system     canal-rwz8j                               2/2     Running            6 (104s ago)     74m
kube-system     canal-sdfmv                               2/2     Running            6 (2m34s ago)    74m
kube-system     canal-trxkx                               2/2     Running            6 (3m3s ago)     74m
kube-system     coredns-autoscaler-74d474f45c-knhk7       1/1     Running            3 (2m34s ago)    74m
kube-system     coredns-dfb7f8fd4-7ncjq                   0/1     Running            3 (99s ago)      74m
kube-system     metrics-server-c47f7c9bb-g5jxw            0/1     CrashLoopBackOff   30 (52s ago)     74m
kube-system     rke-coredns-addon-deploy-job-2vqwq        0/1     Completed          0                74m
kube-system     rke-ingress-controller-deploy-job-6vk5x   0/1     Completed          0                74m
kube-system     rke-metrics-addon-deploy-job-rv5rp        0/1     Completed          0                74m
kube-system     rke-network-plugin-deploy-job-7d6zw       0/1     Completed          0                74m

great-jewelry-76121

05/12/2023, 11:33 AM

I can ping between busybox1 and busybox2 on different nodes:

Cool, that suggests that the canal parts are working at least. Is kube-proxy up and happy? That's what takes care of converting service IPs (like this one) into "real" IPs

quiet-potato-9276

05/12/2023, 11:34 AM

There is no kube-proxy running in the list of pods I gave.

great-jewelry-76121

05/12/2023, 11:34 AM

Does kube-proxy run on the nodes directly instead?

great-jewelry-76121

05/12/2023, 11:34 AM

Or is that the issue? That you don't have kube-proxy, so service IPs are all broken?

great-jewelry-76121

05/12/2023, 11:35 AM

(sorry, I'm familiar with Calico, less familiar with RKE - I work for Tigera on the Calico team)

quiet-potato-9276

05/12/2023, 11:35 AM

Kube-proxy is running as a process on the nodes:

Copy code

root      1852  1717  0 12:21 ?        00:00:01 kube-proxy --cluster-cidr=10.42.0.0/16 --hostname-override=192.168.0.170 --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-proxy.yaml --healthz-bind-address=127.0.0.1 --v=2

great-jewelry-76121

05/12/2023, 11:36 AM

Any interesting logs out of kube-proxy?

quiet-potato-9276

05/12/2023, 11:44 AM

Nothing of note, I'll check the other logs:

quiet-potato-9276

05/12/2023, 11:53 AM

Getting these errors in kube-controller-manager log:

Copy code

{"log":"E0512 11:52:16.996066       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request\n","stream":"stderr","time":"2023-05-12T11:52:16.996473805Z"}
{"log":"W0512 11:52:18.369195       1 garbagecollector.go:752] failed to discover some groups: map[<http://metrics.k8s.io/v1beta1:the|metrics.k8s.io/v1beta1:the> server is currently unable to handle the request]\n","stream":"stderr","time":"2023-05-12T11:52:18.370125543Z"}

quiet-potato-9276

05/12/2023, 12:04 PM

So now see that error in the metrics server. I could see the problem is there as it is not providing the api response:

Copy code

$ kubectl api-resources
error: unable to retrieve the complete list of server APIs: <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request

and:

Copy code

$ kubectl get apiservice
<http://v1beta1.metrics.k8s.io|v1beta1.metrics.k8s.io>                 kube-system/metrics-server   False (MissingEndpoints)   114m

quiet-potato-9276

05/12/2023, 12:19 PM

I'm getting DNS is not working in pods and this error in coredns:

Copy code

[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "<https://10.43.0.1:443/version>": dial tcp 10.43.0.1:443: i/o timeout

quiet-potato-9276

05/12/2023, 12:19 PM

However it is up

Copy code

[al@rkemaster01 ~]$ curl -k <https://10.43.0.1:443/>
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

quiet-potato-9276

05/12/2023, 12:35 PM

So it was DNS. I ran this on all the nodes:

Copy code

iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -F

And killed the existing coredns pod. Now everything is working:

Copy code

NAMESPACE       NAME                                      READY   STATUS      RESTARTS         AGE
default         busybox1                                  1/1     Running     2 (72m ago)      92m
default         busybox2                                  1/1     Running     1 (72m ago)      92m
default         nginx                                     1/1     Running     3 (72m ago)      142m
ingress-nginx   ingress-nginx-admission-create-p8z4t      0/1     Completed   0                144m
ingress-nginx   nginx-ingress-controller-ck5bh            1/1     Running     0                27s
ingress-nginx   nginx-ingress-controller-ct84l            1/1     Running     0                50s
ingress-nginx   nginx-ingress-controller-kqczp            1/1     Running     50 (7m20s ago)   144m
kube-system     calico-kube-controllers-85d56898c-swvqw   1/1     Running     52 (5m46s ago)   144m
kube-system     canal-h5lcp                               2/2     Running     6 (72m ago)      144m
kube-system     canal-rwz8j                               2/2     Running     6 (72m ago)      144m
kube-system     canal-sdfmv                               2/2     Running     6 (72m ago)      144m
kube-system     canal-trxkx                               2/2     Running     6 (73m ago)      144m
kube-system     coredns-autoscaler-74d474f45c-knhk7       1/1     Running     3 (72m ago)      144m
kube-system     coredns-dfb7f8fd4-9gz8j                   1/1     Running     0                3m19s
kube-system     coredns-dfb7f8fd4-dpdcp                   1/1     Running     0                9m15s
kube-system     metrics-server-c47f7c9bb-kjt8f            1/1     Running     0                2m35s
kube-system     rke-coredns-addon-deploy-job-2vqwq        0/1     Completed   0                144m
kube-system     rke-ingress-controller-deploy-job-6vk5x   0/1     Completed   0                144m
kube-system     rke-metrics-addon-deploy-job-rv5rp        0/1     Completed   0                144m
kube-system     rke-network-plugin-deploy-job-7d6zw       0/1     Completed   0                144m

Not sure what is causing this issue and if it will return after a reboot. Never really understood iptables.

great-jewelry-76121

05/12/2023, 12:50 PM

Do you have anything that might have been writing to iptables on the nodes? (Apart from kube-proxy and calico)

great-jewelry-76121

05/12/2023, 12:51 PM

Do you have any network policy configured?

quiet-potato-9276

05/12/2023, 1:03 PM

No. These are fresh VMs and all I had done was bring the cluster up with rke

great-jewelry-76121

05/18/2023, 9:14 AM

These are fresh VMs

Sure, but some OSes enable firewalls by default which write to iptables. ufw, firewalld, etc.

269 Views

Open in Slack

Previous Next