This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/20/2024, 2:45 PM

This message was deleted.

adventurous-magazine-13224

03/20/2024, 2:52 PM

Error in journalctl on one of the workers is:

Copy code

Mar 20 12:05:36 ip-10-200-1-76 k3s[4305]: E0320 12:05:36.246717    4305 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"<http://docker.io/rancher/mirrored-pause:3.6\|docker.io/rancher/mirrored-pause:3.6\>": failed to resolve reference \"<http://docker.io/rancher/mirrored-pause:3.6\|docker.io/rancher/mirrored-pause:3.6\>": unexpected status from HEAD request to <https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io>: 500 Internal Server Error"

creamy-pencil-82913

03/20/2024, 3:05 PM

Check the logs for other messages from Spegel / libp2p to confirm that it's able to connect to the p2p mesh to discover images from other nodes. You're sure the p2p ports are open? You've configured the registries.yaml identically on all nodes?

adventurous-magazine-13224

03/20/2024, 4:07 PM

Hmm no logs at all on any of the nodes for spegel or libp2p 🤔 All ports are open between all of the hosts, and registries.yaml are all the same across all hosts 😞

creamy-pencil-82913

03/20/2024, 5:08 PM

Are you sure you’re looking in the right place? grep for

dht

in the k3s/k3s-agent logs

creamy-pencil-82913

03/20/2024, 5:09 PM

you can add

debug: true

to the config to enable additional logging

adventurous-magazine-13224

03/21/2024, 7:48 AM

Ahh I was looking for Spegel and libp2p... my bad!

dht

warnings every 10 mins though!

Copy code

Mar 21 07:45:00 ip-10-200-1-24 k3s[4293]: 2024-03-21T07:45:00.857Z        WARN        dht/RtRefreshManager        rtrefresh/rt_refresh_manager.go:233        failed when refreshing routing table        {"error": "2 errors occurred:\n\t* failed to query for self, err=failed to find any peer in table\n\t* failed to refresh cpl=0, err=failed to find any peer in table\n\n"}

creamy-pencil-82913

03/21/2024, 8:58 AM

looks like it can’t connect to the other nodes on the DHT port (5001). You’re sure that’s open?

creamy-pencil-82913

03/21/2024, 8:58 AM

if you enable debug you should see the connection attempts

adventurous-magazine-13224

03/21/2024, 9:57 AM

I can see one of these logs every so often on the controlplane:

Copy code

Mar 21 09:55:05 ip-10-200-1-243 k3s[66500]: 2024-03-21T09:55:05.229Z        DEBUG        dht        go-libp2p-kad-dht@v0.25.2/routing.go:397        providing        {"cid": "bafkreiabxdoaa3ceibgvpy2mcazwu47ndya6xeh6sjrqtezqhf7jpzeo4a", "mh": "bciqadog4abweiqcnk7ruyebtnjz62hqb5oip5etdbgjtaol6s7si5ya"}

And one of these every so often on the agents:

Copy code

Mar 21 09:57:04 ip-10-200-1-76 k3s[60114]: 2024-03-21T09:57:04.842Z        DEBUG        dht        go-libp2p-kad-dht@v0.25.2/routing.go:510        finding providers        {"cid": "bafkreig5lsn4dh3jstov5pvfqvqqjhjp2yq26zry5rk2maeqqxt5efsgee", "mh": "bciqn2xe3ygpwtfg5l27klblbasos7vrbv5tdr3cvuyajbbph2ilemii"}

And that's it? 😞

adventurous-magazine-13224

03/21/2024, 10:05 AM

The port is open - can I view the config of DHT anywhere? Maybe hostnames/IPs are wrong? 🤔

adventurous-magazine-13224

03/21/2024, 10:14 AM

Hmm maybe an internal CA issue?

Copy code

Mar 21 10:11:01 ip-10-200-1-76 k3s[61080]: time="2024-03-21T10:11:01Z" level=info msg="spegel 2024/03/21 10:11:01 p2p: \"msg\"=\"could not get bootstrap addresses\" \"error\"=\"CA cert validation failed: Get \\\"<https://ip-10-200-1-243.eu-west-2.compute.internal:6443/cacerts>\\\": tls: failed to verify certificate: x509: certificate is valid for ip-10-200-1-243, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, localhost, not ip-10-200-1-243.eu-west-2.compute.internal\""

adventurous-magazine-13224

03/21/2024, 10:31 AM

I added

tls-san

to my control plane's config.yaml and that fixed it! 😄 Thanks for your help @creamy-pencil-82913!

creamy-pencil-82913

03/21/2024, 5:15 PM

ah that’s interesting, we must be using node hostname instead of node name somewhere…

2 Views

Open in Slack

Previous Next