https://rancher.com/ logo
#k3s
Title
b

bland-painting-61617

09/24/2022, 11:56 AM
I'm experimenting with an idea of hosting the control plane of my K3s based cluster away from home - in another Kubernetes cluster hosted in the cloud. I've created the needed manifests and got it all working using the
--disable-agent
flag so that the control plane pod is not registering itself as a node. The control plane environment is accessed by a public IP which works well, I can get pod logs and shell from my test workstation and from a side container on the control plane node I can curl to services inside the cluster which confirms the built in proxying is working - however, when I deploy gatekeeper, the control plane is not able to execute the webhook (which is strange because I can curl that webhook from the control plane pod sidecar just fine).
Copy code
Error from server (InternalError): error when creating "<https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml>": Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": failed to call webhook: Post "<https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s>": context deadline exceeded
The above is the error and below is the curl from the CP side car:
Copy code
/ # curl "<https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admit?timeout=3s>" -Ivk
*   Trying 10.0.17.2:443...
* Connected to gatekeeper-webhook-service.gatekeeper-system.svc (10.0.17.2) port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=gatekeeper-webhook-service.gatekeeper-system.svc
*  start date: Aug 23 11:45:37 2022 GMT
*  expire date: Aug 23 11:55:37 2024 GMT
*  issuer: CN=gatekeeper-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
The connectivity is definitely there, but somehow something isn't connecting. The k3s cp process logs:
Copy code
k3s I0924 11:53:04.418026      46 trace.go:205] Trace[729582201]: "Proxy via http_connect protocol over tcp" address:10.42.0.8:8443 (24-Sep-2022 11:50:54.455) (total time: 129962ms):
k3s Trace[729582201]: [2m9.962952731s] [2m9.962952731s] END
k3s I0924 11:53:04.418106      46 trace.go:205] Trace[975991258]: "Proxy via http_connect protocol over tcp" address:10.42.0.7:8443 (24-Sep-2022 11:50:53.454) (total time: 130963ms):
k3s Trace[975991258]: [2m10.963777266s] [2m10.963777266s] END
k3s I0924 11:53:04.418387      46 trace.go:205] Trace[1687549248]: "Proxy via http_connect protocol over tcp" address:10.42.0.7:8443 (24-Sep-2022 11:50:53.454) (total time: 130964ms):
k3s Trace[1687549248]: [2m10.964212472s] [2m10.964212472s] END
k3s I0924 11:53:04.418121      46 trace.go:205] Trace[1695746900]: "Proxy via http_connect protocol over tcp" address:10.42.0.9:8443 (24-Sep-2022 11:50:54.944) (total time: 129473ms):
k3s Trace[1695746900]: [2m9.473126709s] [2m9.473126709s] END
k3s I0924 11:53:04.418155      46 trace.go:205] Trace[1578279874]: "Proxy via http_connect protocol over tcp" address:10.42.0.8:8443 (24-Sep-2022 11:50:54.457) (total time: 129961ms):
k3s Trace[1578279874]: [2m9.961131206s] [2m9.961131206s] END
k3s I0924 11:53:04.418160      46 trace.go:205] Trace[406850929]: "Proxy via http_connect protocol over tcp" address:10.42.0.7:8443 (24-Sep-2022 11:50:53.944) (total time: 130473ms):
k3s Trace[406850929]: [2m10.473654339s] [2m10.473654339s] END
k3s I0924 11:53:04.418208      46 trace.go:205] Trace[1213933259]: "Proxy via http_connect protocol over tcp" address:10.42.0.9:8443 (24-Sep-2022 11:50:54.455) (total time: 129962ms):
k3s Trace[1213933259]: [2m9.962336223s] [2m9.962336223s] END
k3s I0924 11:53:04.418239      46 trace.go:205] Trace[1402600249]: "Proxy via http_connect protocol over tcp" address:10.42.0.7:8443 (24-Sep-2022 11:50:55.121) (total time: 129297ms):
k3s Trace[1402600249]: [2m9.297126293s] [2m9.297126293s] END
k3s I0924 11:53:04.418302      46 trace.go:205] Trace[1277249521]: "Proxy via http_connect protocol over tcp" address:10.42.0.8:8443 (24-Sep-2022 11:50:54.119) (total time: 130298ms):
k3s Trace[1277249521]: [2m10.298506336s] [2m10.298506336s] END
k3s I0924 11:53:04.418305      46 trace.go:205] Trace[688495020]: "Proxy via http_connect protocol over tcp" address:10.42.0.8:8443 (24-Sep-2022 11:50:53.454) (total time: 130964ms):
k3s Trace[688495020]: [2m10.964143771s] [2m10.964143771s] END
k3s I0924 11:53:05.376142      46 trace.go:205] Trace[1922244545]: "Call mutating webhook" configuration:gatekeeper-mutating-webhook-configuration,webhook:mutation.gatekeeper.sh,resource:/v1, Resource=configmaps,subresource:,operation:UPDATE,UID:bfb30054-8598-496e-9cd7-cbb4765fa8e1 (24-Sep-2022 11:53:04.375) (total time: 1000ms):
k3s Trace[1922244545]: [1.000712635s] [1.000712635s] END
k3s W0924 11:53:05.376198      46 dispatcher.go:180] Failed calling webhook, failing open mutation.gatekeeper.sh: failed calling webhook "mutation.gatekeeper.sh": failed to call webhook: Post "<https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/mutate?timeout=1s>": context deadline exceeded
k3s E0924 11:53:05.376218      46 dispatcher.go:184] failed calling webhook "mutation.gatekeeper.sh": failed to call webhook: Post "<https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/mutate?timeout=1s>": context deadline exceeded
Any ideas why the proxying wouldn't work for the webhook but works for logs and curl?
c

creamy-pencil-82913

09/26/2022, 12:39 AM
Without a local kubelet and CNI, the control-plane node has no way to reach in-cluster endpoints. Try setting --egress-selector-mode=pod or cluster of pod causes problems.
b

bland-painting-61617

09/26/2022, 9:26 AM
I thought this would be the case but certainly something is working since I can curl against the webhook url from the control plane pod. There's the
Proxy via http_connect protocol over tcp
message which makes me believe the control plane is proxying the connection back to the node hosting the pod.
Where do I put the
egress-selector-mode
flag?
I put
egress-selector-mode: pod
in the cp configmap and I think it might have done the trick - will test more and report back.
c

creamy-pencil-82913

09/26/2022, 5:23 PM
what configmap? K3s doesn’t use a configmap to configure itself. You would want to put it in the server CLI flags or config.yaml.
b

bland-painting-61617

09/26/2022, 9:33 PM
Oh I'm sorry, that's what I meant - I put it in config.yaml which is in a configmap since K3s is running in a pod... Either way, could you point me towards some documentation of what this setting does and what the differences between them?
c

creamy-pencil-82913

09/26/2022, 9:37 PM
b

bland-painting-61617

09/27/2022, 10:16 PM
Thanks, I moved my cluster's control plane from 3 masters at home to an AKS Pod - seems to work so far. Though, it would be nice to have this argument documented in the official docs. Also, not sure if the
pod
setting is default, because until I set it - nothing was working...
c

creamy-pencil-82913

09/27/2022, 11:20 PM
Agent is the default. Was changed after that pr.
👍 1
b

bland-painting-61617

10/01/2022, 6:13 PM
Thanks for your help @creamy-pencil-82913 - always on point 🙂
Hey @creamy-pencil-82913, I noticed a small issue with K3s starting in a K8s pod. Etcd doesn't like the fact the pod IP changes.
Copy code
time="2022-10-05T08:59:35Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [k3st-control-plane-0-9ed411d0=<https://10.244.1.66:2380>], expect: k3st-control-plane-0-9ed411d0=<https://10.244.1.68:2380>"
I worked around it using a small script to update the endpoint IP in etcd allowing it recognize itself as a member and all works well. But I thought, perhaps it would be a good idea to raise an issue on it? Or is there already some kind of flag that would handle it.
--node-ip=$(NODE_IP)
with the variable having the IP of the pod doesn't do the trick.
c

creamy-pencil-82913

10/05/2022, 9:51 AM
When using etcd, node IP addresses are expected to be static for the life of a server node in the cluster.
🤣 1
If you’re running a single-node k3s cluster in a pod, why not just use kine+sqlite?
b

bland-painting-61617

10/05/2022, 10:25 AM
That is a good idea, just haven't explored kine yet. I wanted to keep the possibility of scaling up the statefulset to 2/3 nodes and have it work - but there might be no point in doing that so sqlite might be a good idea.
298 Views