This message was deleted.
# rke2
a
This message was deleted.
d
I might add that I have tried
calico
and
canal
as CNIs with their default settings. Things are running on Proxmox VMs and Ubuntu 22.04. Tried
v1.26.4+rke2r1
and
v1.25.9+rke2r1
c
Yes, the apiserver has options to work with those requirements, anticipating that the webhook will be deployed to the cluster, but that the apiserver might not be a part of the cluster
1. if you target a service, it looks up the service endpoints directly 2. if it is not a part of the cluster, you can set up an egress proxy (Konnectivity service) that the apiserver can use to reach the endpoints
You shouldn’t need to use either of these on RKE2, as the server nodes are also part of the cluster, and should have the ability to reach pod and service addresses, even from the host network namespace
cert-manager is required for rancher, we install it all the time without any issues. I suspect there is something blocking inter-node CNI traffic in your environment.
confirm that you have the correct CNI ports open, and that you don’t have any other iptables rules on your nodes blocking traffic to/from cluster CIDR ranges.
d
Yes, I guess I am doing something wrong - but I really have tried all I can think of and I'm out of options. It's a HA cluster. I think the CNI traffic is working fine. pod to pod traffic is OK. And normal pods can reach services.
c
what version of rke2 are you on?
d
Firewall is inactive - so it should not be the issue.
Copy code
$ /usr/local/bin/rke2 --version
rke2 version v1.25.9+rke2r1 (842d05e64bcbf78552f1db0b32700b8faea403a0)
go version go1.19.8 X:boringcrypto
This is the errors I am getting:
Copy code
Error from server (InternalError): error when creating "cert-manager/k8s/letsencrypt-prod-issuer.yaml": Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
Error from server (InternalError): error when creating "cert-manager/k8s/letsencrypt-staging-issuer.yaml": Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
c
is the webhook pod healthy?
d
Yes, I can invoke it with curl from other pods. Both directly to its pod-IP and the service cluster IP.
But you say that the api server looks up the endpoint(s) directly? Does that imply that it does not use the cluster DNS at all? But rather make explicit connections directly to the pod instead?
c
yep
if it is defined as a Service at least
can you show the output of
kubectl get node -o yaml | grep node-args
?
you can also see that it targets the service directly:
Copy code
brandond@dev01:~$ kubectl get validatingwebhookconfigurations cert-manager-webhook -o yaml | grep -A10 webhooks:
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: XXX
    service:
      name: cert-manager-webhook
      namespace: cert-manager
      path: /validate
      port: 443
  failurePolicy: Fail
d
Output of the
node-args
command:
Copy code
kubectl get node -o yaml | grep node-args
      <http://rke2.io/node-args|rke2.io/node-args>: '["server","--token","********","--data-dir","/var/lib/rancher/rke2","--cni","canal","--tls-san","cluster.local","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","<http://k8s-api.dcs.tickup.net|k8s-api.dcs.tickup.net>","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","10.101.0.11","--tls-san","<http://v1112-dcs-master-2.dcs.tickup.net|v1112-dcs-master-2.dcs.tickup.net>","--tls-san","10.101.0.12","--tls-san","<http://v1113-dcs-master-3.dcs.tickup.net|v1113-dcs-master-3.dcs.tickup.net>","--tls-san","10.101.0.13","--tls-san","<http://v1114-dcs-master-4.dcs.tickup.net|v1114-dcs-master-4.dcs.tickup.net>","--tls-san","10.101.0.14","--tls-san","<http://v1115-dcs-master-5.dcs.tickup.net|v1115-dcs-master-5.dcs.tickup.net>","--tls-san","10.101.0.15","--snapshotter","overlayfs","--node-name","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["server","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--cni","canal","--tls-san","cluster.local","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","<http://k8s-api.dcs.tickup.net|k8s-api.dcs.tickup.net>","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","10.101.0.11","--tls-san","<http://v1112-dcs-master-2.dcs.tickup.net|v1112-dcs-master-2.dcs.tickup.net>","--tls-san","10.101.0.12","--tls-san","<http://v1113-dcs-master-3.dcs.tickup.net|v1113-dcs-master-3.dcs.tickup.net>","--tls-san","10.101.0.13","--tls-san","<http://v1114-dcs-master-4.dcs.tickup.net|v1114-dcs-master-4.dcs.tickup.net>","--tls-san","10.101.0.14","--tls-san","<http://v1115-dcs-master-5.dcs.tickup.net|v1115-dcs-master-5.dcs.tickup.net>","--tls-san","10.101.0.15","--snapshotter","overlayfs","--node-name","<http://v1112-dcs-master-2.dcs.tickup.net|v1112-dcs-master-2.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["server","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--cni","canal","--tls-san","cluster.local","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","<http://k8s-api.dcs.tickup.net|k8s-api.dcs.tickup.net>","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","10.101.0.11","--tls-san","<http://v1112-dcs-master-2.dcs.tickup.net|v1112-dcs-master-2.dcs.tickup.net>","--tls-san","10.101.0.12","--tls-san","<http://v1113-dcs-master-3.dcs.tickup.net|v1113-dcs-master-3.dcs.tickup.net>","--tls-san","10.101.0.13","--tls-san","<http://v1114-dcs-master-4.dcs.tickup.net|v1114-dcs-master-4.dcs.tickup.net>","--tls-san","10.101.0.14","--tls-san","<http://v1115-dcs-master-5.dcs.tickup.net|v1115-dcs-master-5.dcs.tickup.net>","--tls-san","10.101.0.15","--snapshotter","overlayfs","--node-name","<http://v1113-dcs-master-3.dcs.tickup.net|v1113-dcs-master-3.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["server","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--cni","canal","--tls-san","cluster.local","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","<http://k8s-api.dcs.tickup.net|k8s-api.dcs.tickup.net>","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","10.101.0.11","--tls-san","<http://v1112-dcs-master-2.dcs.tickup.net|v1112-dcs-master-2.dcs.tickup.net>","--tls-san","10.101.0.12","--tls-san","<http://v1113-dcs-master-3.dcs.tickup.net|v1113-dcs-master-3.dcs.tickup.net>","--tls-san","10.101.0.13","--tls-san","<http://v1114-dcs-master-4.dcs.tickup.net|v1114-dcs-master-4.dcs.tickup.net>","--tls-san","10.101.0.14","--tls-san","<http://v1115-dcs-master-5.dcs.tickup.net|v1115-dcs-master-5.dcs.tickup.net>","--tls-san","10.101.0.15","--snapshotter","overlayfs","--node-name","<http://v1114-dcs-master-4.dcs.tickup.net|v1114-dcs-master-4.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["server","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--cni","canal","--tls-san","cluster.local","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","<http://k8s-api.dcs.tickup.net|k8s-api.dcs.tickup.net>","--tls-san","<http://v1111-dcs-master-1.dcs.tickup.net|v1111-dcs-master-1.dcs.tickup.net>","--tls-san","10.101.0.11","--tls-san","<http://v1112-dcs-master-2.dcs.tickup.net|v1112-dcs-master-2.dcs.tickup.net>","--tls-san","10.101.0.12","--tls-san","<http://v1113-dcs-master-3.dcs.tickup.net|v1113-dcs-master-3.dcs.tickup.net>","--tls-san","10.101.0.13","--tls-san","<http://v1114-dcs-master-4.dcs.tickup.net|v1114-dcs-master-4.dcs.tickup.net>","--tls-san","10.101.0.14","--tls-san","<http://v1115-dcs-master-5.dcs.tickup.net|v1115-dcs-master-5.dcs.tickup.net>","--tls-san","10.101.0.15","--snapshotter","overlayfs","--node-name","<http://v1115-dcs-master-5.dcs.tickup.net|v1115-dcs-master-5.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["agent","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--snapshotter","overlayfs","--node-name","<http://v1121-dcs-worker-1.dcs.tickup.net|v1121-dcs-worker-1.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["agent","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--snapshotter","overlayfs","--node-name","<http://v1122-dcs-worker-2.dcs.tickup.net|v1122-dcs-worker-2.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["agent","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--snapshotter","overlayfs","--node-name","<http://v1123-dcs-worker-3.dcs.tickup.net|v1123-dcs-worker-3.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["agent","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--snapshotter","overlayfs","--node-name","<http://v1124-dcs-worker-4.dcs.tickup.net|v1124-dcs-worker-4.dcs.tickup.net>"]'
      <http://rke2.io/node-args|rke2.io/node-args>: '["agent","--server","<https://v1111-dcs-master-1.dcs.tickup.net:9345>","--token","********","--data-dir","/var/lib/rancher/rke2","--snapshotter","overlayfs","--node-name","<http://v1125-dcs-worker-5.dcs.tickup.net|v1125-dcs-worker-5.dcs.tickup.net>"]'
c
I don’t see anything in there that would influence the apiserver’s behavior with regards to how it connects to the webhook service
d
OK, thanks!
c
there are a lot of unnecessary args in there, things set to their defaults and such, but other than that looks ok
are you by any chance using a proxy? do you have HTTP_PROXY/HTTPS_PROXY env vars set?
d
Yeah, I got carried away with the san stuff. I should clean that up.
No, there are no proxies involved.
c
you might try setting
--egress-selector-mode=cluster
on the servers, just to see if that helps?
set that on the servers, then restart the services on the servers and agents
d
OK, I'll try that!
c
you’re on 1.25.9 on all of the servers and agents?
d
Yes, I just rebuilt the cluster from scratch (deploying new VMs) and eveything is installed using Ansible, so it should be equal on all nodes.
The
--egress-selector-mode=cluster
did not seem to help.
c
hmm. if you add --debug to the server flags, do you get anything useful in the logs?
d
I don't find anything special. It's late now - I'm giving up for today. Thanks for all your help!
194 Views