rapid-jelly-999510/17/2022, 12:12 PM
Now, my question: is it somehow possible to accept nodes blindly without validating their certificates? One thing to mention is that this is a local only dev cluster.
Error from server: Get "<https://192.168.1.100:10351/containerLogs/kubeedge/edgemesh-agent-q78bp/edgemesh-agent>": x509: cannot validate certificate for 192.168.1.100 because it doesn't contain any IP SANs
late-needle-8086010/17/2022, 12:49 PM
rapid-jelly-999510/18/2022, 6:23 AM
Here are my pods running on my edge node.
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k3d-edgefarm-core-dev-agent-1 Ready <none> 17h v1.22.15+k3s1 192.168.16.3 <none> K3s dev 5.17.0-1016-oem <containerd://1.5.13-k3s1> k3d-edgefarm-core-dev-server-0 Ready control-plane,master 17h v1.22.15+k3s1 192.168.16.4 <none> K3s dev 5.17.0-1016-oem <containerd://1.5.13-k3s1> k3d-edgefarm-core-dev-agent-0 Ready <none> 17h v1.22.15+k3s1 192.168.16.2 <none> K3s dev 5.17.0-1016-oem <containerd://1.5.13-k3s1> clownfish Ready agent,edge 10h v1.22.6-kubeedge-v1.11.1-12+0d66acb85546eb-dirty 192.168.1.100 <none> Ubuntu 22.04.1 LTS 5.15.0-1012-raspi <docker://20.10.18>
kubectl exec and logs calls try to directly talk to the node to execute the request there. This cannot work for edge devices, because they are very likely located in a completely different network, or may be connected via a LTE modem. Thus, these
$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=clownfish NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kubeedge edgemesh-agent-lcttg 1/1 Running 0 3m29s 192.168.1.100 clownfish <none> <none> nodegroup example-app-clownfish-7c4f5654c-x5cqb 2/2 Running 0 60s 172.17.0.2 clownfish <none> <none>
calls from kubectl are forwarded using iptables to the cloudcore service running on a cloud node. So, whats happening is that the request targets to`192.168.1.100` but gets redirected to the control plane nodes (via the iptables-manager pod).
See the iptables here that placed on the control-plane node.
kubectl get pods -o wide -n kubeedge Found existing alias for "kubectl get pods". You should use: "kgp" NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES edgemesh-agent-rhnzj 1/1 Running 1 (11h ago) 17h 192.168.16.4 k3d-edgefarm-core-dev-server-0 <none> <none> kubeedge-controller-manager-85b5d8f6fb-q94x8 1/1 Running 1 (11h ago) 17h 10.42.2.10 k3d-edgefarm-core-dev-agent-1 <none> <none> edgemesh-agent-56t9b 1/1 Running 2 (11h ago) 17h 192.168.16.3 k3d-edgefarm-core-dev-agent-1 <none> <none> edgemesh-agent-t5b4v 1/1 Running 1 (11h ago) 17h 192.168.16.5 k3d-edgefarm-core-dev-agent-0 <none> <none> iptables-manager-x6bqt 1/1 Running 0 10h 192.168.16.4 k3d-edgefarm-core-dev-server-0 <none> <none> cloudcore-59bbfcf5c7-lmxtx 2/2 Running 0 10h 192.168.16.2 k3d-edgefarm-core-dev-agent-0 <none> <none> edgemesh-agent-lcttg 1/1 Running 0 9m43s 192.168.1.100 clownfish <none> <none>
So, when i exec or log to some pod that is running on the edge node, it calls it with
$ kubectl exec -it iptables-manager-x6bqt sh / # iptables -t nat -L TUNNEL-PORT Chain TUNNEL-PORT (2 references) target prot opt source destination DNAT tcp -- anywhere anywhere tcp dpt:10353 to:192.168.16.2:10003 DNAT tcp -- anywhere anywhere tcp dpt:10351 to:192.168.16.3:10003 DNAT tcp -- anywhere anywhere tcp dpt:10352 to:192.168.16.5:10003
. The request gets redirected by the iptables rule to the node cloudcore is running (
) to the port 10003 where some cloudcore port is opened. Cloudcore acts as a proxy for API calls to the edge node. Now, the problem is that the kubectl request sent to the API Server still has the
in it. The cloudcore server certificate the request gets redirected to, does not have the edge node's IP address in its
. Therefore the request gets rejected. I added
to the IP Sans of the cloudcore server certificate just to verify that this is the point of failure. When i retrieve logs from a edge node's pod i get the following error:
So, this is definitly the issue.
$ kubectl logs edgemesh-agent-lcttg Error from server: Get "<https://192.168.1.100:10353/containerLogs/kubeedge/edgemesh-agent-lcttg/edgemesh-agent>": x509: certificate is valid for 0.0.0.0, not 192.168.1.100
allow the feature
So the same logs request with the insecure-skip-tls-verify-backend enabled allows me to see the pods logs
--insecure-skip-tls-verify-backend=false: Skip verifying the identity of the kubelet that logs are requested from. In theory, an attacker could provide invalid log content back. You might want to use this if your kubelet serving certificates have expired
I could live with that fact, but the
$ kubectl logs edgemesh-agent-lcttg --insecure-skip-tls-verify-backend I1018 05:56:26.085654 1 server.go:55] Version: v1.12.0-dirty I1018 05:56:26.085891 1 server.go:89]  Prepare agent to run I1018 05:56:26.088853 1 netif.go:96] bridge device edgemesh0 already exists I1018 05:56:26.089349 1 server.go:93] edgemesh-agent running on EdgeMode ...
feature doesn't allow the
flag or anything that does the same. I hope you can understand my problem. If not, please let me know and i try to elaborate a little bit deeper. The only possible solution i might think of right now Predefine a range of IP addresses my edge node can get in my local dev environment.