This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

03/04/2024, 4:40 PM

This message was deleted.

ambitious-plastic-3551

03/04/2024, 4:41 PM

Actual IPs

billions-accountant-52971

03/04/2024, 4:41 PM

Yes, I have that

ambitious-plastic-3551

03/04/2024, 4:42 PM

but also cluster-cidr:

ambitious-plastic-3551

03/04/2024, 4:42 PM

and service-cidr:

ambitious-plastic-3551

03/04/2024, 4:42 PM

all need both

billions-accountant-52971

03/04/2024, 4:42 PM

yes

ambitious-plastic-3551

03/04/2024, 4:43 PM

::1 localhost in hosts file?

billions-accountant-52971

03/04/2024, 4:43 PM

yes

bland-account-99790

03/04/2024, 6:16 PM

@plain-byte-79620

billions-accountant-52971

03/04/2024, 9:42 PM

reason: WaitingForNodeRef

severity: Info

status: 'False'

type: NodeHealthy

Looks like they get stuck on this

billions-accountant-52971

03/04/2024, 9:49 PM

message: Cluster agent is not connected reason: Disconnected status: 'False' type: Ready

ambitious-plastic-3551

03/04/2024, 9:55 PM

Which rancher version

ambitious-plastic-3551

03/04/2024, 9:55 PM

If you upgraded from 2.7.9 to 2.8.1 downgrade to 2.8.0 first then 2.8.1+

billions-accountant-52971

03/04/2024, 9:56 PM

Rancher: v2.8.2 Cluster: v1.27.10+rke2r1

billions-accountant-52971

03/04/2024, 9:56 PM

It's a fresh 2.8.2 install

ambitious-plastic-3551

03/04/2024, 9:57 PM

interesting

ambitious-plastic-3551

03/04/2024, 9:57 PM

What if you downgrade eitherway?

ambitious-plastic-3551

03/04/2024, 9:57 PM

😄

billions-accountant-52971

03/04/2024, 9:57 PM

The rancher is a manually installed RKE2-cluster on six fresh suse-machines to.

billions-accountant-52971

03/04/2024, 9:58 PM

And it's installed with FluxCD from helm chart.

ambitious-plastic-3551

03/04/2024, 9:59 PM

huh

billions-accountant-52971

03/04/2024, 10:02 PM

I am not sure how you bootstrap your rancher environment correctly.

ambitious-plastic-3551

03/04/2024, 10:05 PM

There is no right way, just many ways 🙂

plain-byte-79620

03/05/2024, 8:49 AM

Are you deploying them with the rancher generated script that the ui?

billions-accountant-52971

03/05/2024, 8:50 AM

yes

plain-byte-79620

03/05/2024, 8:51 AM

I think you should configure the IP from that script there should be a flag I think it's named node-address or something.

billions-accountant-52971

03/05/2024, 8:52 AM

interesting, I tried to find some docs about that but was unable to. But my google-fu around rancher is still in training mode.

plain-byte-79620

03/05/2024, 8:53 AM

I can check from a test ui where you have to find them

billions-accountant-52971

03/05/2024, 8:56 AM

Yes please.

billions-accountant-52971

03/05/2024, 8:59 AM

"-a" | "--address") CATTLE_ADDRESS="$2"

billions-accountant-52971

03/05/2024, 8:59 AM

is it this one?

billions-accountant-52971

03/05/2024, 8:59 AM

"-i" | "--internal-address") CATTLE_INTERNAL_ADDRESS="$2"

billions-accountant-52971

03/05/2024, 8:59 AM

or this?

plain-byte-79620

03/05/2024, 9:12 AM

--address

you have to add it manually with ipv4,ipv6

billions-accountant-52971

03/05/2024, 9:15 AM

2024-03-05T09:14:17.629601+00:00 ranch1 rke2[1662]: time="2024-03-05T09:14:17Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"

2024-03-05T09:14:21.167593+00:00 ranch1 rke2[1662]: time="2024-03-05T09:14:21Z" level=info msg="Waiting for API server to become available"

2024-03-05T09:14:22.359715+00:00 ranch1 rke2[1662]: time="2024-03-05T09:14:22Z" level=warning msg="Failed to list nodes with etcd role: runtime core not ready"

Here is the logs from the machiene now - would be interesting to know what the 500-error is.

plain-byte-79620

03/05/2024, 9:16 AM

you should wait that the node is up I think

billions-accountant-52971

03/05/2024, 9:16 AM

im giving it a few minutes then - fetching a cup of hot java

plain-byte-79620

03/05/2024, 9:17 AM

you can check the status with kubectl inside the node

billions-accountant-52971

03/05/2024, 12:38 PM

ranch1 Ready control-plane,etcd,master 3h15m v1.27.10+rke2r1

Copy code

ranch1:~ # /var/lib/rancher/rke2/data/v1.27.10-rke2r1-31de34f39de5/bin/kubectl --kubeconfig /var/lib/rancher/rke2/agent/kubelet.kubeconfig get nodes
NAME     STATUS   ROLES                       AGE     VERSION
ranch1   Ready    control-plane,etcd,master   3h15m   v1.27.10+rke2r1

After a long while this is showing in the first node, but in the rancher-gui it still say "Waiting for node"

billions-accountant-52971

03/05/2024, 12:41 PM

Seems like the ClusterCIDR/ServiceCIDR is ignored.

billions-accountant-52971

03/05/2024, 12:42 PM

This is the cluster CIDR, configured in Rancher for the cluster: fd7cb6d3041a5064:/64,10.60.0.0/16 And this is the ip of one of the pods in kube-system, and this looks like the "default" cidr if you dont provide one. IPs: IP: 10.42.0.18

billions-accountant-52971

03/05/2024, 12:44 PM

And same with the service-ips. they are from 10.43.0.0/16 - but I have configured in rancher 10.61.0.0/16

plain-byte-79620

03/05/2024, 2:09 PM

where did you configure the CIDR?

billions-accountant-52971

03/06/2024, 8:48 AM

In the rancher ui for the new cluster.

billions-accountant-52971

03/06/2024, 8:52 AM

There is a node driver notice. But I checked those and nothing feels applicable.

billions-accountant-52971

03/06/2024, 8:53 AM

Sorry for the mobile ui, I am out an about today.

plain-byte-79620

03/06/2024, 9:14 AM

but it's IPv6 only you have to specify ipv4,ipv6 and I think there is a flag enable IPv6 near the box where you configure the CNI

billions-accountant-52971

03/06/2024, 9:39 AM

Ni, there is both a ipv4 and ipv6 cidr in both boxes.

billions-accountant-52971

03/06/2024, 9:39 AM

And the ipv6 box is ticked.

plain-byte-79620

03/06/2024, 10:35 AM

are you sure that when you are running the install script on the node there aren't any older setup for RKE2? could you try to run

rke2-uninstall.sh

before installing the new setup through rancher?

billions-accountant-52971

03/06/2024, 10:39 AM

Yes, I can do that. Is away all day today. Will check back here when done

billions-accountant-52971

03/06/2024, 8:55 PM

Copy code

status:
  bootstrapReady: true
  conditions:
    - lastTransitionTime: '2024-03-05T09:08:01Z'
      status: 'True'
      type: Ready
    - lastTransitionTime: '2024-03-05T09:08:01Z'
      status: 'True'
      type: BootstrapReady
    - lastTransitionTime: '2024-03-05T09:07:59Z'
      status: 'True'
      type: InfrastructureReady
    - lastTransitionTime: '2024-03-05T09:07:59Z'
      reason: WaitingForNodeRef
      severity: Info
      status: 'False'
      type: NodeHealthy
  lastUpdated: '2024-03-05T09:08:01Z'
  observedGeneration: 2
  phase: Provisioning

Same effect, it gets stuck like this.

billions-accountant-52971

03/06/2024, 8:57 PM

Copy code

curl -fL <https://rancher.domain.tld/system-agent-install.sh> | sudo  sh -s - --server <https://rancher.domain.tld> --label '<http://cattle.io/os=linux|cattle.io/os=linux>' --token <jfr...> --address "172.16.135.51,2a07:beef:5:2002:be24:11ff:fe60:115d" --etcd --controlplane --worker

This is the commandline for the install, and the addreses i the static node-ips for ipv6 and ipv4.

plain-byte-79620

03/07/2024, 9:01 AM

the IPs are from the same interface? Could you check the RKE2 logs on the node?

billions-accountant-52971

03/07/2024, 10:40 AM

yes they are from the same if

billions-accountant-52971

03/07/2024, 10:41 AM

What specific log should I look in?

plain-byte-79620

03/07/2024, 10:43 AM

you could check the status of the server with

systemctl status rke2-server.service

to check if there are any errors

billions-accountant-52971

03/07/2024, 10:47 AM

Copy code

Mar 06 20:24:52 ranch1 rke2[19110]: time="2024-03-06T20:24:52Z" level=error msg="Failed to process config: lstat /var/lib/rancher/rke2/server/manifests: no such file or directory"
Mar 06 20:25:05 ranch1 rke2[19110]: {"level":"warn","ts":"2024-03-06T20:25:05.078178Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000d12>>
Mar 06 20:25:05 ranch1 rke2[19110]: time="2024-03-06T20:25:05Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Mar 06 20:25:07 ranch1 rke2[19110]: time="2024-03-06T20:25:07Z" level=error msg="Failed to process config: lstat /var/lib/rancher/rke2/server/manifests: no such file or directory"
Mar 06 20:25:14 ranch1 rke2[19110]: {"level":"warn","ts":"2024-03-06T20:25:14.947945Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000d12>>
Mar 06 20:25:14 ranch1 rke2[19110]: {"level":"info","ts":"2024-03-06T20:25:14.948009Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
Mar 06 20:25:20 ranch1 rke2[19110]: {"level":"warn","ts":"2024-03-06T20:25:20.078694Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000d12>>
Mar 06 20:25:20 ranch1 rke2[19110]: time="2024-03-06T20:25:20Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Mar 06 20:25:20 ranch1 rke2[19110]: time="2024-03-06T20:25:20Z" level=fatal msg="leaderelection lost for rke2-etcd"

Seems like the etcd is nowhere to be found - I have only started one node, or installed one node of three sofar.

plain-byte-79620

03/07/2024, 10:49 AM

How many nodes are up now?

billions-accountant-52971

03/07/2024, 10:50 AM

billions-accountant-52971

03/07/2024, 10:50 AM

but 0 from rancher, all that one is stuck waitingfornode

plain-byte-79620

03/07/2024, 10:51 AM

but if you check with kubectl inside the node is it ready or not?

billions-accountant-52971

03/07/2024, 11:02 AM

hmm, looking strange now - I will tear down it all and rebuild the machines I think

billions-accountant-52971

03/08/2024, 11:43 AM

I rebuilt the machines, deployed an identical cluster without any ipv6 - and it just worked directly. I will try to rebuild them again, but now set a static ipv6-address and not use the permanent SLAAC-address that I used before to see if its a difference.

billions-accountant-52971

03/08/2024, 11:43 AM

I also defined a custom ClusterCIDR and ServiceCIDR and it got configured as expected.

plain-byte-79620

03/08/2024, 4:12 PM

so did it work?

billions-accountant-52971

03/08/2024, 4:34 PM

For ipv4 it worked

billions-accountant-52971

03/08/2024, 4:34 PM

I have not yuet had time to do it again, with ipv6 - and a static address for all nodes.

12 Views

Open in Slack

Previous Next