https://rancher.com/ logo
#k3d
Title
a

aloof-oxygen-4191

03/06/2023, 8:11 AM
Gm When i create k3d (v5.4.7) with cli args, all is fine when i create using a yaml, it cant join the agents docker logs on the agents show
error dialing load balancer servers: all servers failed
even that
disableLoadbalancer: true
(not sure related to the error ?) my yaml is (if some values are needed will supply)
<https://termbin.com/sw7km6>
trying to make it more minimal meanwhile CLI command that works
Copy code
k3d cluster create $CLUSTER_NAME --registry-use $REGISTRY_NAME \
        --api-port $KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT \
        --servers=$KUBEVIRT_NUM_SERVERS \
        --agents=$KUBEVIRT_NUM_AGENTS \
        --k3s-arg "--disable=traefik@server:0" \
        --no-lb \
        --k3s-arg "--flannel-backend=none@server:*" \
        --k3s-arg "--kubelet-arg=cpu-manager-policy=static@agent:*" \
        --k3s-arg "--kubelet-arg=kube-reserved=cpu=500m@agent:*" \
        --k3s-arg "--kubelet-arg=system-reserved=cpu=500m@agent:*" \
        --volume "$(pwd)/cluster-up/cluster/k3d/manifests/calico.yaml:/var/lib/rancher/k3s/server/manifests/calico.yaml@server:0" \
        -v /dev/vfio:/dev/vfio@agent:* \
        -v /lib/modules:/lib/modules@agent:* \
        -v ${id1}:/etc/machine-id@server:0 \
        -v ${id2}:/etc/machine-id@agent:0 \
        -v ${id3}:/etc/machine-id@agent:1
Thanks
maybe need quotes for extra args
didnt help
w

wide-garage-9465

03/21/2023, 8:15 AM
Hey 👋 Are the servers all up and running? (Check via docker ps and kubectl get nodes) Can you share the whole log of one of the failing nodes please?
a

aloof-oxygen-4191

03/21/2023, 8:58 AM
Hi Thanks for your reply I will recreate it and paste when i have it
Copy code
time="2023-03-21T11:48:48Z" level=info msg="Starting k3s agent v1.25.6+k3s1 (9176e03c)"
time="2023-03-21T11:48:48Z" level=info msg="Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [k3d-sriov-server-0:6443]"
time="2023-03-21T11:48:53Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:48:53Z" level=debug msg="Incoming conn 127.0.0.1:47134, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:48:58Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:48:58Z" level=debug msg="Incoming conn 127.0.0.1:47164, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:48:58Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:47164->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:49:05Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:05Z" level=debug msg="Incoming conn 127.0.0.1:59418, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:10Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:10Z" level=debug msg="Incoming conn 127.0.0.1:59436, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:10Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:59436->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:49:17Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:17Z" level=debug msg="Incoming conn 127.0.0.1:58578, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:22Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:22Z" level=debug msg="Incoming conn 127.0.0.1:41722, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:22Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:41722->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:49:29Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:29Z" level=debug msg="Incoming conn 127.0.0.1:41752, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:34Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:34Z" level=debug msg="Incoming conn 127.0.0.1:40784, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:34Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:40784->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:49:41Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:41Z" level=debug msg="Incoming conn 127.0.0.1:40794, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:46Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:46Z" level=debug msg="Incoming conn 127.0.0.1:52154, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:46Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:52154->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:49:53Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:53Z" level=debug msg="Incoming conn 127.0.0.1:43658, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:58Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:49:58Z" level=debug msg="Incoming conn 127.0.0.1:43682, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:49:58Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:43682->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:50:05Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:05Z" level=debug msg="Incoming conn 127.0.0.1:38920, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:10Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:10Z" level=debug msg="Incoming conn 127.0.0.1:38932, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:10Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:38932->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:50:17Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:17Z" level=debug msg="Incoming conn 127.0.0.1:45444, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:22Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:22Z" level=debug msg="Incoming conn 127.0.0.1:33084, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:22Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:33084->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:50:29Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:29Z" level=debug msg="Incoming conn 127.0.0.1:33092, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:34Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:34Z" level=debug msg="Incoming conn 127.0.0.1:45878, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:34Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:45878->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:50:41Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:41Z" level=debug msg="Incoming conn 127.0.0.1:45884, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:46Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:46Z" level=debug msg="Incoming conn 127.0.0.1:47222, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:46Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:47222->127.0.0.1:6444: read: connection reset by peer"
time="2023-03-21T11:50:53Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:53Z" level=debug msg="Incoming conn 127.0.0.1:47424, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:58Z" level=debug msg="Dial error from load balancer k3s-agent-load-balancer: dial tcp: lookup k3d-sriov-server-0: Try again"
time="2023-03-21T11:50:58Z" level=debug msg="Incoming conn 127.0.0.1:47446, error dialing load balancer servers: all servers failed"
time="2023-03-21T11:50:58Z" level=error msg="failed to get CA certs: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:47446->127.0.0.1:6444: read: connection reset by peer"
this is the code i am trying https://github.com/oshoval/kubevirtci/commit/f0aa327efc391c60c76e24de8f93128091ed735b maybe i should use a bit different manifest ? ChatGPT suggested some basic manifest which i can try to start with (unless it is old one) Thanks
Copy code
apiVersion: <http://k3d.io/v1alpha4|k3d.io/v1alpha4>
kind: Simple
metadata:
  name: cluster
servers: 1
agents: 2
this worked, so maybe i will add to it field field to see what is the bad field
it seems that if i remove the network field it works better in that aspect (unless there is some other bug ?) the network is needed to support podman in case it has other network (unless i can create an alias to that network as workaround which might be a good alternative)
but it has other problems still, nodes /pod dont start well, need to find what more field is problematic but this might be a problem on my server atm, will wait for it to become better
Hi Sorry that took time, our machines had problems I can confirm that removing the
network: $NETWORK
fixed the problem but we do need this field in order to support both podman and docker seems like a real bug ? should i open issue ? Is there a workaround ? maybe some way to rename a network / alias it ? (not sure we can but i can try) (using v5.4.7) Thanks
20 Views