https://rancher.com/ logo
f

fierce-elephant-30846

09/28/2022, 12:10 AM
Hi everyone! I'm getting stucked when try register a cluster on my RKE HA Rancher. A continuous message "Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered" is present when I apply the command on the vm to be provisioned. I'm using this proxy configuration on my Rancher HA instance: https://docs.ranchermanager.rancher.io/how-to-guides/new-user-guides/infrastructure-setup/nginx-load-balancer Seems like it's a proxy problem, because if I change my DNS to the RKE node where Rancher is deployed, ignoring the Load Balancer, the cluster is deployed normally. I'm using RKE v1.3.10 (v1.22.9) and Rancher 2.6.8. Anyone could help me? Thanks very much!
q

quick-sandwich-76600

09/28/2022, 12:20 AM
What do you see on Rancher agent container log?
👍 1
f

fierce-elephant-30846

09/28/2022, 12:43 AM
Copy code
INFO: Arguments: --server <https://rancher-dev.maringa.pr.gov.br> --token REDACTED --address 172.16.3.182 --internal-address 172.16.3.182 --etcd --controlplane --worker
INFO: Environment: CATTLE_ADDRESS=172.16.3.182 CATTLE_INTERNAL_ADDRESS=172.16.3.182 CATTLE_NODE_NAME=k8s-worker-02 CATTLE_ROLE=,etcd,worker,controlplane CATTLE_SERVER=<https://rancher-dev.maringa.pr.gov.br> CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: search <http://pmm.gov.br|pmm.gov.br> nameserver 10.0.2.3 nameserver 172.16.0.33 nameserver 172.16.0.13 options single-request-reopen
INFO: <https://rancher-dev.maringa.pr.gov.br/ping> is accessible
INFO: <http://rancher-dev.maringa.pr.gov.br|rancher-dev.maringa.pr.gov.br> resolves to 172.16.3.175
INFO[0000] Listening on /tmp/log.sock
INFO[0000] Rancher agent version v2.6.8 is starting     
INFO[0000] Option customConfig=map[address:172.16.3.182 internalAddress:172.16.3.182 label:map[] roles:[etcd worker controlplane] taints:[]]
INFO[0000] Option etcd=true
INFO[0000] Option controlPlane=true
INFO[0000] Option worker=true
INFO[0000] Option requestedHostname=k8s-worker-02
INFO[0000] Option dockerInfo={GQ7O:2QZQ:OB3Y:LGMJ:5ZPY:QG7L:JFAL:4HWX:7LT6:TNFY:VSVH:J3QM 1 1 0 0 1 overlay2 [[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff false] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true true true true true true true true true true true false 34 true 49 2022-09-24T06:14:39.943120344Z json-file cgroupfs 1 0 5.4.17-2136.311.6.el8uek.x86_64 Oracle Linux Server 8.6 8.6 linux x86_64 <https://index.docker.io/v1/> 0xc000fee1c0 2 5925376000 [] /var/lib/docker    k8s-worker-02 [] false 20.10.7   map[io.containerd.runc.v2:{runc [] 
<nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc {  inactive false  [] 0 0 <nil> []} false  docker-init {9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6} {v1.1.4-0-g5fd4c4d v1.1.4-0-g5fd4c4d} {de40ad0 de40ad0} [name=seccomp,profile=default]  [] []}
INFO[0000] Connecting to <wss://rancher-dev.maringa.pr.gov.br/v3/connect/register> with token starting with mklvp5d9kc2lcpgrpp9xr6jh7dz 
INFO[0000] Connecting to proxy                           url="<wss://rancher-dev.maringa.pr.gov.br/v3/connect/register>"
INFO[0000] Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered
INFO[0002] Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered
@quick-sandwich-76600 this is the command generated by Rancher
Copy code
sudo docker run -it --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.6.8 --server <https://rancher-dev.maringa.pr.gov.br> --token mklvp5d9kc2lcpgrpp9xr6jh7dzgqtkm2fwkpkp26h4j2zzvnz6v4d --address 172.16.3.182 --internal-address 172.16.3.182 --etcd --controlplane --worker
I'm using Vagrant, Nat to public Ip, Oracle Linux 8, Swap Off, SELinux and Firewall Disabled and Rancher installed in tls secret mode
@quick-sandwich-76600 Using Haproxy, the result is the same
Copy code
docker run -it --restart=unless-stopped --name my-running-haproxy -p 80:80 -p 443:443 -v /home/vagrant/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg -v /home/vagrant/maringa.pr.gov.br.pem:/etc/haproxy/certs/maringa.pr.gov.br.pem haproxy
haproxy.cfg
Copy code
global
defaults
    mode http
    log global
    option httplog
    option  http-server-close
    option  dontlognull
    option  redispatch
    option  contstats
    retries 3
    backlog 10000
    timeout client          25s
    timeout connect          5s
    timeout server          25s
    # timeout tunnel available in ALOHA 5.5 or HAProxy 1.5-dev10 and higher
    timeout tunnel        3600s
    timeout http-keep-alive  1s
    timeout http-request    15s
    timeout queue           30s
    timeout tarpit          60s
    default-server inter 3s rise 2 fall 3
    option forwardfor
frontend port80-redirect
    mode http
    bind *:80 
    redirect scheme https    
frontend frontend_https
    bind *:443
    mode tcp
    ## routing based on Host header
    acl host_ws hdr_beg(Host) -i ws.
    use_backend backend_https if host_ws
    ## routing based on websocket protocol header
    acl hdr_connection_upgrade hdr(Connection)  -i upgrade
    acl hdr_upgrade_websocket  hdr(Upgrade)     -i websocket
    use_backend backend_https if hdr_connection_upgrade hdr_upgrade_websocket
    default_backend bk_web    
    default_backend backend_https
backend backend_https
    balance roundrobin
    option httpchk HEAD / 
    mode tcp
    ## websocket protocol validation
    acl hdr_connection_upgrade hdr(Connection)                 -i upgrade
    acl hdr_upgrade_websocket  hdr(Upgrade)                    -i websocket
    acl hdr_websocket_key      hdr_cnt(Sec-WebSocket-Key)      eq 1
    acl hdr_websocket_version  hdr_cnt(Sec-WebSocket-Version)  eq 1
    http-request deny if ! hdr_connection_upgrade ! hdr_upgrade_websocket ! hdr_websocket_key ! hdr_websocket_version
    ## ensure our application protocol name is valid 
    ## (don't forget to update the list each time you publish new applications)
    acl ws_valid_protocol hdr(Sec-WebSocket-Protocol) echo-protocol
    http-request deny if ! ws_valid_protocol
    ## websocket health checking
    option httpchk GET / HTTP/1.1rnHost:\ ws.domain.comrnConnection:\ Upgrade\r\nUpgrade:\ websocket\r\nSec-WebSocket-Key:\ haproxy\r\nSec-WebSocket-Version:\ 13\r\nSec-WebSocket-Protocol:\ echo-protocol
    http-check expect status 101
    server srv1 172.16.3.177:443
Hi, changing to aws vm's instead vagrant on-premise, I see this log
INFO: Arguments: --server <https://rancher-dev.maringa.pr.gov.br> --token REDACTED --etcd
INFO: Environment: CATTLE_ADDRESS=172.31.8.228 CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=ip-172-31-8-228 CATTLE_ROLE=,etcd CATTLE_SERVER=https://rancher-dev.maringa.pr.gov.br CATTLE_TOKEN=REDACTED INFO: Using resolv.conf: nameserver 127.0.0.53 options edns0 trust-ad search ec2.internal WARN: Loopback address found in /etc/resolv.conf, please refer to the documentation how to configure your cluster to resolve DNS properly INFO: https://rancher-dev.maringa.pr.gov.br/ping is accessible INFO: rancher-dev.maringa.pr.gov.br resolves to 3.232.184.180 time="2022-10-05T152300Z" level=info msg="Listening on /tmp/log.sock" time="2022-10-05T152300Z" level=info msg="Rancher agent version v2.6.8 is starting" time="2022-10-05T152300Z" level=info msg="Option customConfig=map[address:172.31.8.228 internalAddress: label:map[] roles:[etcd] taints:[]]" time="2022-10-05T152300Z" level=info msg="Option etcd=true" time="2022-10-05T152300Z" level=info msg="Option controlPlane=false" time="2022-10-05T152300Z" level=info msg="Option worker=false" time="2022-10-05T152300Z" level=info msg="Option requestedHostname=ip-172-31-8-228" time="2022-10-05T152300Z" level=info msg="Option dockerInfo={3VNI25U6ULFPTOFAZDFSG5H5DO644JAJCKPOE5JATOLX:WBQJ 1 1 0 0 1 overlay2 [[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true true true true true true true true true true true false 28 true 38 2022-10-05T152300.879844199Z json-file cgroupfs 1 0 5.15.0-1019-aws Ubuntu 20.04.5 LTS 20.04 linux x86_64 https://index.docker.io/v1/ 0xc001ca6070 2 4051689472 [] /var/lib/docker ip-172-31-8-228 [] false 20.10.7 map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc { inactive false [] 0 0 <nil> []} false docker-init {9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6} {v1.1.4-0-g5fd4c4d v1.1.4-0-g5fd4c4d} {de40ad0 de40ad0} [name=apparmor name=seccomp,profile=default] [] []}" time="2022-10-05T152300Z" level=info msg="Connecting to wss://rancher-dev.maringa.pr.gov.br/v3/connect/register with token starting with r6fstskk79qn2tm4qrgb68ss55s" time="2022-10-05T152300Z" level=info msg="Connecting to proxy" url="wss://rancher-dev.maringa.pr.gov.br/v3/connect/register" time="2022-10-05T152301Z" level=error msg="Failed to connect to proxy. Response status: 400 - 400 Bad Request. Response body: Operation cannot be fulfilled on nodes.management.cattle.io \"m-3bd692266880\": the object has been modified; please apply your changes to the latest version and try again" error="websocket: bad handshake" time="2022-10-05T152301Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake" time="2022-10-05T152311Z" level=info msg="Connecting to wss://rancher-dev.maringa.pr.gov.br/v3/connect/register with token starting with r6fstskk79qn2tm4qrgb68ss55s" time="2022-10-05T152311Z" level=info msg="Connecting to proxy" url="wss://rancher-dev.maringa.pr.gov.br/v3/connect/register" time="2022-10-05T152311Z" level=info msg="Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered" time="2022-10-05T152313Z" level=info msg="Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered"
@quick-sandwich-76600 I finally found the problem. Http no-proxy settings was missing on the helm command to install Rancher https://docs.ranchermanager.rancher.io/reference-guides/installation-references/helm-chart-options#http-proxy I think that this directives above strongly should be present on the infrastructure setup too, 'cause it don't mention anything about no-proxy here: https://docs.ranchermanager.rancher.io/how-to-guides/new-user-guides/infrastructure-setup/nginx-load-balancer Big hug!
👍 1
c

creamy-pencil-82913

10/08/2022, 3:07 AM
nginx doesn’t need no-proxy settings, just the rancher pod (configured by the helm chart)
q

quick-sandwich-76600

10/09/2022, 9:39 PM
Hi @fierce-elephant-30846. Thank you for the update.
a

agreeable-oil-87482

10/10/2022, 6:14 PM
I can't see the logs well on my phone, are those the cluster agent logs from the sandbox cluster?
f

fierce-elephant-30846

10/10/2022, 6:46 PM
@agreeable-oil-87482 It's the Provisioning Log screen, in Rancher Cluster Management
92 Views