adamant-kite-43734
10/17/2022, 12:12 PMlate-needle-80860
10/17/2022, 12:49 PMlate-needle-80860
10/17/2022, 1:02 PMlate-needle-80860
10/17/2022, 1:02 PMlate-needle-80860
10/17/2022, 1:02 PMrapid-jelly-9995
10/18/2022, 6:23 AMclownfish
.
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3d-edgefarm-core-dev-agent-1 Ready <none> 17h v1.22.15+k3s1 192.168.16.3 <none> K3s dev 5.17.0-1016-oem <containerd://1.5.13-k3s1>
k3d-edgefarm-core-dev-server-0 Ready control-plane,master 17h v1.22.15+k3s1 192.168.16.4 <none> K3s dev 5.17.0-1016-oem <containerd://1.5.13-k3s1>
k3d-edgefarm-core-dev-agent-0 Ready <none> 17h v1.22.15+k3s1 192.168.16.2 <none> K3s dev 5.17.0-1016-oem <containerd://1.5.13-k3s1>
clownfish Ready agent,edge 10h v1.22.6-kubeedge-v1.11.1-12+0d66acb85546eb-dirty 192.168.1.100 <none> Ubuntu 22.04.1 LTS 5.15.0-1012-raspi <docker://20.10.18>
Here are my pods running on my edge node.
$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=clownfish
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kubeedge edgemesh-agent-lcttg 1/1 Running 0 3m29s 192.168.1.100 clownfish <none> <none>
nodegroup example-app-clownfish-7c4f5654c-x5cqb 2/2 Running 0 60s 172.17.0.2 clownfish <none> <none>
kubectl exec and logs calls try to directly talk to the node to execute the request there. This cannot work for edge devices, because they are very likely located in a completely different network, or may be connected via a LTE modem. Thus, these exec
and logs
calls from kubectl are forwarded using iptables to the cloudcore service running on a cloud node. So, whats happening is that the request targets to`192.168.1.100` but gets redirected to the control plane nodes (via the iptables-manager pod).
kubectl get pods -o wide -n kubeedge
Found existing alias for "kubectl get pods". You should use: "kgp"
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
edgemesh-agent-rhnzj 1/1 Running 1 (11h ago) 17h 192.168.16.4 k3d-edgefarm-core-dev-server-0 <none> <none>
kubeedge-controller-manager-85b5d8f6fb-q94x8 1/1 Running 1 (11h ago) 17h 10.42.2.10 k3d-edgefarm-core-dev-agent-1 <none> <none>
edgemesh-agent-56t9b 1/1 Running 2 (11h ago) 17h 192.168.16.3 k3d-edgefarm-core-dev-agent-1 <none> <none>
edgemesh-agent-t5b4v 1/1 Running 1 (11h ago) 17h 192.168.16.5 k3d-edgefarm-core-dev-agent-0 <none> <none>
iptables-manager-x6bqt 1/1 Running 0 10h 192.168.16.4 k3d-edgefarm-core-dev-server-0 <none> <none>
cloudcore-59bbfcf5c7-lmxtx 2/2 Running 0 10h 192.168.16.2 k3d-edgefarm-core-dev-agent-0 <none> <none>
edgemesh-agent-lcttg 1/1 Running 0 9m43s 192.168.1.100 clownfish <none> <none>
See the iptables here that placed on the control-plane node.
$ kubectl exec -it iptables-manager-x6bqt sh
/ # iptables -t nat -L TUNNEL-PORT
Chain TUNNEL-PORT (2 references)
target prot opt source destination
DNAT tcp -- anywhere anywhere tcp dpt:10353 to:192.168.16.2:10003
DNAT tcp -- anywhere anywhere tcp dpt:10351 to:192.168.16.3:10003
DNAT tcp -- anywhere anywhere tcp dpt:10352 to:192.168.16.5:10003
So, when i exec or log to some pod that is running on the edge node, it calls it with 192.168.1.100
. The request gets redirected by the iptables rule to the node cloudcore is running (k3d-edgefarm-core-dev-agent-0
with 192.168.16.2
) to the port 10003 where some cloudcore port is opened. Cloudcore acts as a proxy for API calls to the edge node. Now, the problem is that the kubectl request sent to the API Server still has the 192.168.1.100
in it. The cloudcore server certificate the request gets redirected to, does not have the edge node's IP address in its IP Sans
. Therefore the request gets rejected.
I added 0.0.0.0
to the IP Sans of the cloudcore server certificate just to verify that this is the point of failure.
When i retrieve logs from a edge node's pod i get the following error:
$ kubectl logs edgemesh-agent-lcttg
Error from server: Get "<https://192.168.1.100:10353/containerLogs/kubeedge/edgemesh-agent-lcttg/edgemesh-agent>": x509: certificate is valid for 0.0.0.0, not 192.168.1.100
So, this is definitly the issue. kubectl logs
allow the feature
--insecure-skip-tls-verify-backend=false: Skip verifying the identity of the kubelet that logs are requested from.
In theory, an attacker could provide invalid log content back. You might want to use this if your kubelet serving
certificates have expired
So the same logs request with the insecure-skip-tls-verify-backend enabled allows me to see the pods logs
$ kubectl logs edgemesh-agent-lcttg --insecure-skip-tls-verify-backend
I1018 05:56:26.085654 1 server.go:55] Version: v1.12.0-dirty
I1018 05:56:26.085891 1 server.go:89] [1] Prepare agent to run
I1018 05:56:26.088853 1 netif.go:96] bridge device edgemesh0 already exists
I1018 05:56:26.089349 1 server.go:93] edgemesh-agent running on EdgeMode
...
I could live with that fact, but the kubectl exec
feature doesn't allow the insecure-skip-tls-verify-backend
flag or anything that does the same.
I hope you can understand my problem. If not, please let me know and i try to elaborate a little bit deeper.
The only possible solution i might think of right now
Predefine a range of IP addresses my edge node can get in my local dev environment.