https://rancher.com/ logo
#rke2
Title
# rke2
a

adamant-kite-43734

06/21/2022, 8:17 PM
This message was deleted.
h

hundreds-evening-84071

06/21/2022, 8:22 PM
looks like it... all the nodes reporting okay?
kubectl get nodes
c

curved-caravan-26314

06/21/2022, 8:23 PM
Copy code
root@rke2:/home/ilook# kubectl get nodes
NAME   STATUS   ROLES                       AGE   VERSION
rke2   Ready    control-plane,etcd,master   25m   v1.23.7+rke2r2
h

hundreds-evening-84071

06/21/2022, 8:24 PM
yeah - so it shows 1 node cluster. if that is correct then you are ready to deploy rancher or any app...
c

curved-caravan-26314

06/21/2022, 8:25 PM
Hoping for the best. I am going to add worker nodes via the UI
🤞 1
h

hundreds-evening-84071

06/21/2022, 8:31 PM
oh! you have not set any taints correct - on that one node name "rke2"? like
controlplane=:NoSchedule
and
etcd=:NoExecute
c

curved-caravan-26314

06/21/2022, 8:31 PM
I didnt set any taints
👍 1
its not letting me get to the login screen... its like it cant see it self here @hundreds-evening-84071 can you see if my page loads? https://rke2.n2k8.com
The web page loads if im on cellular data, the page doesnt load here locally on the FQDN. Where do i go to fix this?
r

rapid-helmet-86074

06/21/2022, 9:06 PM
I don't think you usually add worker nodes to the local cluster. Local tends to be worker role on the control plane since you're only running Rancher and its related services on it.
(and for downstream if setting up RKE2 separately and joining later I don't think you can add workers from the UI and you should add the taints to the control plane nodes)
My first guess on FQDN not working from local net but works externally is that internally your DNS for rke2.n2k8.com resolves to something different (maybe goes out externally through a proxy that won't let you go back in, maybe just completely different). You'd need the Rancher hostname to resolve to either nodes running the rke2-ingress-nginx-controller pod in your local cluster, or to a load balancer that points to said nodes in your local cluster that would pass the traffic through with the hostname.
c

curved-caravan-26314

06/21/2022, 9:25 PM
What I'm seeing first is there is a mount volume error reported
r

rapid-helmet-86074

06/21/2022, 9:25 PM
Never noticed that with Rancher, but if it works externally then I'm guessing that's not your problem for internal access.
c

curved-caravan-26314

06/21/2022, 9:29 PM
I found this error...
Copy code
Name:         rancher-7bbd98588-2mqtg
Namespace:    cattle-system
Priority:     0
Node:         rke2/192.168.1.145
Start Time:   Tue, 21 Jun 2022 15:24:25 -0500
Labels:       app=rancher
              pod-template-hash=7bbd98588
              release=rancher
Annotations:  <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: 5b44982cc7f3302fc3a4623c2d2dad03476363e33cb9600aadc51ca3cfe521fa
              <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 10.42.0.14/32
              <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 10.42.0.14/32
              <http://kubernetes.io/psp|kubernetes.io/psp>: global-unrestricted-psp
Status:       Running
IP:           10.42.0.14
IPs:
  IP:           10.42.0.14
Controlled By:  ReplicaSet/rancher-7bbd98588
Containers:
  rancher:
    Container ID:  <containerd://65256afbc4ed446301dff1488f7c3f291b6aa88bcd2035ca8d828df9b01f44a>6
    Image:         rancher/rancher:v2.6.5
    Image ID:      <http://docker.io/rancher/rancher@sha256:ae5135c25b2141bb2aac8a03a9afd77e845f36b9a6c000377c858233aae355d4|docker.io/rancher/rancher@sha256:ae5135c25b2141bb2aac8a03a9afd77e845f36b9a6c000377c858233aae355d4>
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --no-cacerts
      --http-listen-port=80
      --https-listen-port=443
      --add-local=true
    State:          Running
      Started:      Tue, 21 Jun 2022 15:30:30 -0500
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
    Environment:
      CATTLE_NAMESPACE:           cattle-system
      CATTLE_PEER_SERVICE:        rancher
      CATTLE_BOOTSTRAP_PASSWORD:  <set to the key 'bootstrapPassword' in secret 'bootstrap-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2rwsm (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-2rwsm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 <http://cattle.io/os=linux:NoSchedule|cattle.io/os=linux:NoSchedule>
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  35m                default-scheduler  Successfully assigned cattle-system/rancher-7bbd98588-2mqtg to rke2
  Normal   Pulling    35m                kubelet            Pulling image "rancher/rancher:v2.6.5"
  Normal   Pulled     29m                kubelet            Successfully pulled image "rancher/rancher:v2.6.5" in 6m0.181711241s
  Normal   Created    29m                kubelet            Created container rancher
  Normal   Started    29m                kubelet            Started container rancher
  Warning  Unhealthy  27m (x4 over 28m)  kubelet            Readiness probe failed: Get "<http://10.42.0.14:80/healthz>": dial tcp 10.42.0.14:80: connect: connection refused
Then
Copy code
Name:         rancher-7bbd98588-hm495
Namespace:    cattle-system
Priority:     0
Node:         rke2/192.168.1.145
Start Time:   Tue, 21 Jun 2022 15:24:25 -0500
Labels:       app=rancher
              pod-template-hash=7bbd98588
              release=rancher
Annotations:  <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: fc821cf9e33787b6ba55ed7e7124d3194399ac1f419a56eb359cfd36027c320f
              <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 10.42.0.15/32
              <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 10.42.0.15/32
              <http://kubernetes.io/psp|kubernetes.io/psp>: global-unrestricted-psp
Status:       Running
IP:           10.42.0.15
IPs:
  IP:           10.42.0.15
Controlled By:  ReplicaSet/rancher-7bbd98588
Containers:
  rancher:
    Container ID:  <containerd://f66b5e2765a33f9e04b8d9324644a4ae2ff4106f98207b571bc67866b3d3823>1
    Image:         rancher/rancher:v2.6.5
    Image ID:      <http://docker.io/rancher/rancher@sha256:ae5135c25b2141bb2aac8a03a9afd77e845f36b9a6c000377c858233aae355d4|docker.io/rancher/rancher@sha256:ae5135c25b2141bb2aac8a03a9afd77e845f36b9a6c000377c858233aae355d4>
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --no-cacerts
      --http-listen-port=80
      --https-listen-port=443
      --add-local=true
    State:          Running
      Started:      Tue, 21 Jun 2022 15:30:29 -0500
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
    Environment:
      CATTLE_NAMESPACE:           cattle-system
      CATTLE_PEER_SERVICE:        rancher
      CATTLE_BOOTSTRAP_PASSWORD:  <set to the key 'bootstrapPassword' in secret 'bootstrap-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wgcvc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-wgcvc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 <http://cattle.io/os=linux:NoSchedule|cattle.io/os=linux:NoSchedule>
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  35m                default-scheduler  Successfully assigned cattle-system/rancher-7bbd98588-hm495 to rke2
  Normal   Pulling    35m                kubelet            Pulling image "rancher/rancher:v2.6.5"
  Normal   Pulled     29m                kubelet            Successfully pulled image "rancher/rancher:v2.6.5" in 5m59.329193041s
  Normal   Created    29m                kubelet            Created container rancher
  Normal   Started    29m                kubelet            Started container rancher
  Warning  Unhealthy  27m (x4 over 28m)  kubelet            Readiness probe failed: Get "<http://10.42.0.15:80/healthz>": dial tcp 10.42.0.15:80: connect: connection refused
finally
Copy code
Name:         rancher-7bbd98588-q5h2f
Namespace:    cattle-system
Priority:     0
Node:         rke2/192.168.1.145
Start Time:   Tue, 21 Jun 2022 15:24:25 -0500
Labels:       app=rancher
              pod-template-hash=7bbd98588
              release=rancher
Annotations:  <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: cd3206f99bd04b1a5d65faa07d8e111877dda5e2b3a9eaac9f0fd2bc7a9449f5
              <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 10.42.0.13/32
              <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 10.42.0.13/32
              <http://kubernetes.io/psp|kubernetes.io/psp>: global-unrestricted-psp
Status:       Running
IP:           10.42.0.13
IPs:
  IP:           10.42.0.13
Controlled By:  ReplicaSet/rancher-7bbd98588
Containers:
  rancher:
    Container ID:  <containerd://ff8feaa172c83c159f1094906c3d060b9b395960e354f7ceb986b3f86fd6629>1
    Image:         rancher/rancher:v2.6.5
    Image ID:      <http://docker.io/rancher/rancher@sha256:ae5135c25b2141bb2aac8a03a9afd77e845f36b9a6c000377c858233aae355d4|docker.io/rancher/rancher@sha256:ae5135c25b2141bb2aac8a03a9afd77e845f36b9a6c000377c858233aae355d4>
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --no-cacerts
      --http-listen-port=80
      --https-listen-port=443
      --add-local=true
    State:          Running
      Started:      Tue, 21 Jun 2022 15:32:15 -0500
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 21 Jun 2022 15:30:47 -0500
      Finished:     Tue, 21 Jun 2022 15:31:53 -0500
    Ready:          True
    Restart Count:  2
    Liveness:       http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
    Environment:
      CATTLE_NAMESPACE:           cattle-system
      CATTLE_PEER_SERVICE:        rancher
      CATTLE_BOOTSTRAP_PASSWORD:  <set to the key 'bootstrapPassword' in secret 'bootstrap-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5ws94 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-5ws94:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 <http://cattle.io/os=linux:NoSchedule|cattle.io/os=linux:NoSchedule>
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  35m                default-scheduler  Successfully assigned cattle-system/rancher-7bbd98588-q5h2f to rke2
  Normal   Pulling    35m                kubelet            Pulling image "rancher/rancher:v2.6.5"
  Normal   Pulled     29m                kubelet            Successfully pulled image "rancher/rancher:v2.6.5" in 5m59.832370366s
  Warning  Unhealthy  27m                kubelet            Liveness probe failed: Get "<http://10.42.0.13:80/healthz>": dial tcp 10.42.0.13:80: connect: connection refused
  Normal   Pulled     27m (x2 over 28m)  kubelet            Container image "rancher/rancher:v2.6.5" already present on machine
  Normal   Created    27m (x3 over 29m)  kubelet            Created container rancher
  Normal   Started    27m (x3 over 29m)  kubelet            Started container rancher
  Warning  Unhealthy  27m (x6 over 28m)  kubelet            Readiness probe failed: Get "<http://10.42.0.13:80/healthz>": dial tcp 10.42.0.13:80: connect: connection refused
Do I reboot the server and see if it auto-corrects?
I think for some reason it needs more time to finish the process... where do i go to give it more time?
r

rapid-helmet-86074

06/21/2022, 9:39 PM
You might be able to edit the Deployment to add more of a timeout for the health check, but that's not as likely the cause. I'd look at the logs for the pods rather than describe and see if you see anything more useful. That just tells you the pods all self-identified to be killed. If they're doing that then the access externally was probably luck before they killed themselves.
c

curved-caravan-26314

06/21/2022, 9:43 PM
Where do i edit the deployment to add more of a timeout for the health check?
r

rapid-helmet-86074

06/21/2022, 9:44 PM
I don't know where in the YAML it goes specifically, but you'd be looking for the part of the pod spec that's related to health checks. If that's not defined you may need to search docs or web to find what to add.
kubectl edit deployment rancher -n cattle-system
or something very similar is how you do the editing itself.
Unless your system's too bogged down, that's not too likely to be the cause so you may want to try looking through the pod logs instead.
h

hundreds-evening-84071

06/21/2022, 10:16 PM
agree with above, you need to look at the pod logs before editing deployment... how does this look:
kubectl get all -n cattle-system
also, if you have networking weirdness going on then that will cause issues. best to look at network requirements and resolve
f

faint-airport-83518

06/22/2022, 5:46 PM
@curved-caravan-26314 These are two repos that I've referenced for deployment patterns rancherfederal/rke2-azure-tf: RKE2 provisioning on Azure with Terraform. https://github.com/rancherfederal/rke2-azure-tf rancherfederal/rke2-aws-tf https://github.com/rancherfederal/rke2-aws-tf
basically they use bash scripts ran by cloud-init to bootstrap the nodes via userdata
rke2-init.sh
is the main script
52 Views