This message was deleted.
# general
a
This message was deleted.
c
Use --node-ip to set the nodes addresses to the correct interface ip. It's probably picking the wrong interface, as by default it uses the one with the lowest cost default route
r
I see there is
--node-ip
and then
node-external-ip
. What's the difference between the two?
c
Ones external?
r
Yea.... I can see that but don't quite get what exactly that entails.
c
Kubernetes nodes have three kinds of address. Hostname, internal IP, and external IP.
Usually when running a cloud environment, the nodes have an internal IP that is actually bound to an interface, and an external non-RFC1918 address that the internal IP is natted to for inbound access from the internet.
All of those address types can be set by flags. The hostname and internal IP assume reasonable defaults, but the external IP must be set manually or via a cloud provider
r
I have restarted the server service with
node-ip
specified. 🤞
It's taking quite a while...
c
You will probably have to either uninstall and reinstall, or do a --cluster-reset. Etcd doesn't like it when ip addresses change.
You're not really supposed to change node IPs once the cluster is up.
r
Yikes! The status is showing this:
Copy code
Jan 18 13:19:36 vm-01 rke2[94347]: time="2023-01-18T13:19:36+07:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [vm-01-1f2809ba=<https://10>>
Jan 18 13:19:36 vm-01 rke2[94347]: time="2023-01-18T13:19:36+07:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Se>
Jan 18 13:19:39 vm-01 rke2[94347]: time="2023-01-18T13:19:39+07:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jan 18 13:19:41 vm-01 rke2[94347]: time="2023-01-18T13:19:41+07:00" level=info msg="Defragmenting etcd database"
Jan 18 13:19:41 vm-01 rke2[94347]: time="2023-01-18T13:19:41+07:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [vm-01-1f2809ba=<https://10>>
Jan 18 13:19:41 vm-01 rke2[94347]: time="2023-01-18T13:19:41+07:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Se>
Jan 18 13:19:44 vm-01 rke2[94347]: time="2023-01-18T13:19:44+07:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jan 18 13:19:46 vm-01 rke2[94347]: time="2023-01-18T13:19:46+07:00" level=info msg="Defragmenting etcd database"
Jan 18 13:19:46 vm-01 rke2[94347]: time="2023-01-18T13:19:46+07:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [vm-01-1f2809ba=<https://10>>
Jan 18 13:19:46 vm-01 rke2[94347]: time="2023-01-18T13:19:46+07:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Se>
Wait! So I must set the
node-ip
before starting it?
c
Yes. The servers need static IPs, and those must be set correctly before joining the cluster (or creating the cluster by starting the first server).
r
In the file
/etc/rancher/rke2/config.yaml
I must specify the token, which is generated after the cluster has started (
/var/lib/rancher/rke2/server/node-token
). Does that mean I don't have to specify the token in that file?
I just uninstall and re-install the server. Then I use the
config.yaml
file from the previous installation before starting it. The server fails to start.
I re-install then run the server with the following:
Copy code
sudo rke2 server --node-ip 192.168.56.101 --node-external-ip 192.168.56.101
Afterwards I can see the nodes up and running, some with the IP specified above. However when I run
systemctl status rke2-server
, it shows:
Copy code
â—Ź rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: <https://github.com/rancher/rke2#readme>

Jan 18 14:35:52 vm-01 rke2[156065]: time="2023-01-18T14:35:52+07:00" level=info msg="Active TLS secret kube-system/rke2-serving (ver=321) (count 10): map[listene>
Jan 18 14:35:52 vm-01 rke2[156065]: I0118 14:35:52.581635  156065 event.go:294] "Event occurred" object="kube-system/rke2-metrics-server" fieldPath="" kind="Helm>
Jan 18 14:35:52 vm-01 rke2[156065]: I0118 14:35:52.602986  156065 event.go:294] "Event occurred" object="kube-system/rke2-metrics-server" fieldPath="" kind="Helm>
Jan 18 14:47:20 vm-01 systemd[1]: Stopping Rancher Kubernetes Engine v2 (server)...
Jan 18 14:47:20 vm-01 rke2[156065]: W0118 14:47:20.130271  156065 reflector.go:442] <http://k8s.io/client-go@v1.24.9-k3s1/tools/cache/reflector.go:167|k8s.io/client-go@v1.24.9-k3s1/tools/cache/reflector.go:167>: watch of *v1.Endp>
Jan 18 14:47:20 vm-01 rke2[156065]: time="2023-01-18T14:47:20+07:00" level=info msg="Shutting down <http://k3s.cattle.io/v1|k3s.cattle.io/v1>, Kind=Addon workers"
Jan 18 14:47:20 vm-01 rke2[156065]: time="2023-01-18T14:47:20+07:00" level=fatal msg="context canceled"
Jan 18 14:47:20 vm-01 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jan 18 14:47:20 vm-01 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Jan 18 14:47:20 vm-01 systemd[1]: Stopped Rancher Kubernetes Engine v2 (server).
What must I do to get it to work properly after setting
node-ip
?
So I run into this Github issue and follow the instructions there:
Copy code
$ systemctl stop rke2-server
$ sudo rke2 server --cluster-reset
$ systemctl restart rke2-server
Afterwards, I can see RKE2 server status as ACTIVE. However running
kubectl get pods --output=wide -A
shows that
etcd-vm-01
/
kube-apiserver-vm-01
/
kube-proxy-vm-01
are now using the old
10.0.2.15
IP again. This seems to be the kubelet default behavior stated in this issue. Looks like everything will be fixed if there's a way to specify the interface that kubelet must use. How can it be done?
I have made some progress by using the following steps:
Copy code
$ systemctl enable rke2-server
$ systemctl start rke2-server
... WAIT FOR ALL PODS TO BE READY
$ sudo touch /etc/rancher/rke2/config.yaml
$ sudo nano /etc/rancher/rke2/config.yaml
... EDIT CONFIG FILE
$ systemctl stop rke2-server
$ sudo rke2 server --cluster-reset --node-ip 192.168.56.101 --node-external-ip 192.168.56.101  --advertise-address 192.168.56.101
$ sudo reboot
When the pods are all good, there are 2 pods that still bind to the
10.0.2.15
interface/IP:
Copy code
helm-install-rke2-canal
helm-install-rke2-coredns
They are shown as Completed. Is that something I must worry? How do I set them to the desired
192.168.56.101
?
c
no, those are completed job pods. they finished before you changed the IP and will not be re-run. You’re not rewriting history here, so stuff that ran with the old IP will continue to show that.
r
Thank you so much!
113 Views