This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

01/18/2023, 4:56 AM

This message was deleted.

creamy-pencil-82913

01/18/2023, 5:59 AM

Use --node-ip to set the nodes addresses to the correct interface ip. It's probably picking the wrong interface, as by default it uses the one with the lowest cost default route

refined-eye-25557

01/18/2023, 6:01 AM

I see there is

--node-ip

and then

node-external-ip

. What's the difference between the two?

creamy-pencil-82913

01/18/2023, 6:15 AM

Ones external?

refined-eye-25557

01/18/2023, 6:15 AM

Yea.... I can see that but don't quite get what exactly that entails.

creamy-pencil-82913

01/18/2023, 6:15 AM

Kubernetes nodes have three kinds of address. Hostname, internal IP, and external IP.

creamy-pencil-82913

01/18/2023, 6:17 AM

Usually when running a cloud environment, the nodes have an internal IP that is actually bound to an interface, and an external non-RFC1918 address that the internal IP is natted to for inbound access from the internet.

creamy-pencil-82913

01/18/2023, 6:19 AM

All of those address types can be set by flags. The hostname and internal IP assume reasonable defaults, but the external IP must be set manually or via a cloud provider

refined-eye-25557

01/18/2023, 6:19 AM

I have restarted the server service with

node-ip

specified. 🤞

refined-eye-25557

01/18/2023, 6:19 AM

It's taking quite a while...

creamy-pencil-82913

01/18/2023, 6:20 AM

You will probably have to either uninstall and reinstall, or do a --cluster-reset. Etcd doesn't like it when ip addresses change.

creamy-pencil-82913

01/18/2023, 6:21 AM

You're not really supposed to change node IPs once the cluster is up.

refined-eye-25557

01/18/2023, 6:21 AM

Yikes! The status is showing this:

Copy code

Jan 18 13:19:36 vm-01 rke2[94347]: time="2023-01-18T13:19:36+07:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [vm-01-1f2809ba=<https://10>>
Jan 18 13:19:36 vm-01 rke2[94347]: time="2023-01-18T13:19:36+07:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Se>
Jan 18 13:19:39 vm-01 rke2[94347]: time="2023-01-18T13:19:39+07:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jan 18 13:19:41 vm-01 rke2[94347]: time="2023-01-18T13:19:41+07:00" level=info msg="Defragmenting etcd database"
Jan 18 13:19:41 vm-01 rke2[94347]: time="2023-01-18T13:19:41+07:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [vm-01-1f2809ba=<https://10>>
Jan 18 13:19:41 vm-01 rke2[94347]: time="2023-01-18T13:19:41+07:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Se>
Jan 18 13:19:44 vm-01 rke2[94347]: time="2023-01-18T13:19:44+07:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jan 18 13:19:46 vm-01 rke2[94347]: time="2023-01-18T13:19:46+07:00" level=info msg="Defragmenting etcd database"
Jan 18 13:19:46 vm-01 rke2[94347]: time="2023-01-18T13:19:46+07:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [vm-01-1f2809ba=<https://10>>
Jan 18 13:19:46 vm-01 rke2[94347]: time="2023-01-18T13:19:46+07:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Se>

refined-eye-25557

01/18/2023, 6:21 AM

Wait! So I must set the

node-ip

before starting it?

creamy-pencil-82913

01/18/2023, 6:23 AM

Yes. The servers need static IPs, and those must be set correctly before joining the cluster (or creating the cluster by starting the first server).

refined-eye-25557

01/18/2023, 6:24 AM

In the file

/etc/rancher/rke2/config.yaml

I must specify the token, which is generated after the cluster has started (

/var/lib/rancher/rke2/server/node-token

). Does that mean I don't have to specify the token in that file?

refined-eye-25557

01/18/2023, 6:28 AM

I just uninstall and re-install the server. Then I use the

config.yaml

file from the previous installation before starting it. The server fails to start.

refined-eye-25557

01/18/2023, 8:20 AM

I re-install then run the server with the following:

Copy code

sudo rke2 server --node-ip 192.168.56.101 --node-external-ip 192.168.56.101

Afterwards I can see the nodes up and running, some with the IP specified above. However when I run

systemctl status rke2-server

, it shows:

Copy code

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: <https://github.com/rancher/rke2#readme>

Jan 18 14:35:52 vm-01 rke2[156065]: time="2023-01-18T14:35:52+07:00" level=info msg="Active TLS secret kube-system/rke2-serving (ver=321) (count 10): map[listene>
Jan 18 14:35:52 vm-01 rke2[156065]: I0118 14:35:52.581635  156065 event.go:294] "Event occurred" object="kube-system/rke2-metrics-server" fieldPath="" kind="Helm>
Jan 18 14:35:52 vm-01 rke2[156065]: I0118 14:35:52.602986  156065 event.go:294] "Event occurred" object="kube-system/rke2-metrics-server" fieldPath="" kind="Helm>
Jan 18 14:47:20 vm-01 systemd[1]: Stopping Rancher Kubernetes Engine v2 (server)...
Jan 18 14:47:20 vm-01 rke2[156065]: W0118 14:47:20.130271  156065 reflector.go:442] <http://k8s.io/client-go@v1.24.9-k3s1/tools/cache/reflector.go:167|k8s.io/client-go@v1.24.9-k3s1/tools/cache/reflector.go:167>: watch of *v1.Endp>
Jan 18 14:47:20 vm-01 rke2[156065]: time="2023-01-18T14:47:20+07:00" level=info msg="Shutting down <http://k3s.cattle.io/v1|k3s.cattle.io/v1>, Kind=Addon workers"
Jan 18 14:47:20 vm-01 rke2[156065]: time="2023-01-18T14:47:20+07:00" level=fatal msg="context canceled"
Jan 18 14:47:20 vm-01 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jan 18 14:47:20 vm-01 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Jan 18 14:47:20 vm-01 systemd[1]: Stopped Rancher Kubernetes Engine v2 (server).

What must I do to get it to work properly after setting

node-ip

refined-eye-25557

01/18/2023, 9:01 AM

So I run into this Github issue and follow the instructions there:

Copy code

$ systemctl stop rke2-server
$ sudo rke2 server --cluster-reset
$ systemctl restart rke2-server

Afterwards, I can see RKE2 server status as ACTIVE. However running

kubectl get pods --output=wide -A

shows that

etcd-vm-01

kube-apiserver-vm-01

kube-proxy-vm-01

are now using the old

10.0.2.15

IP again. This seems to be the kubelet default behavior stated in this issue. Looks like everything will be fixed if there's a way to specify the interface that kubelet must use. How can it be done?

refined-eye-25557

01/23/2023, 4:39 PM

I have made some progress by using the following steps:

Copy code

$ systemctl enable rke2-server
$ systemctl start rke2-server
... WAIT FOR ALL PODS TO BE READY
$ sudo touch /etc/rancher/rke2/config.yaml
$ sudo nano /etc/rancher/rke2/config.yaml
... EDIT CONFIG FILE
$ systemctl stop rke2-server
$ sudo rke2 server --cluster-reset --node-ip 192.168.56.101 --node-external-ip 192.168.56.101  --advertise-address 192.168.56.101
$ sudo reboot

When the pods are all good, there are 2 pods that still bind to the

10.0.2.15

interface/IP:

Copy code

helm-install-rke2-canal
helm-install-rke2-coredns

They are shown as Completed. Is that something I must worry? How do I set them to the desired

192.168.56.101

creamy-pencil-82913

01/23/2023, 8:00 PM

no, those are completed job pods. they finished before you changed the IP and will not be re-run. You’re not rewriting history here, so stuff that ran with the old IP will continue to show that.

refined-eye-25557

01/24/2023, 10:16 AM

Thank you so much!

148 Views

Open in Slack

Previous Next