This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

04/30/2025, 9:06 PM

This message was deleted.

creamy-pencil-82913

04/30/2025, 9:28 PM

I’m not sure what you mean “grpc image sharing”

creamy-pencil-82913

04/30/2025, 9:29 PM

it acts similar to a pull-through cache, except the backing store is the containerd image store on other nodes

creamy-pencil-82913

04/30/2025, 9:29 PM

it does not proactively push content between nodes.

creamy-pencil-82913

04/30/2025, 9:29 PM

you can read the docs in the spegel project repo for more info

narrow-baker-33979

05/01/2025, 2:01 PM

Thank you for your reply. I guess what I am saying is that I do not see it working. I have an image on one node and am expecting a pod deployed explicitly to another node to start up, having the image discovered and pulled through via Spegel / peer-2-peer. Have you had any trouble with this feature?

creamy-pencil-82913

05/01/2025, 4:03 PM

No, generally it just works, if you enable it as covered in the docs.

creamy-pencil-82913

05/01/2025, 4:04 PM

What makes you think it's not using spegel? How have you configured your nodes?

narrow-baker-33979

05/05/2025, 2:22 PM

I think the problem is that we also have Harbor listed as a mirror. Although Spegel is listed first, It just seems as though the listing order does not set precedence. I also ran across this post: https://github.com/spegel-org/spegel/issues/277

creamy-pencil-82913

05/05/2025, 5:37 PM

Although Spegel is listed first, It just seems as though the listing order does not set precedence.

That is not the case. Mirrors are tried in the order listed, and spegel is always tried first. I’ll ask again - what makes you think that it’s not working?

creamy-pencil-82913

05/05/2025, 5:38 PM

This is covered in the RKE2 docs… https://docs.rke2.io/install/registry_mirror#enabling-registry-mirroring

Endpoints for registry mirrors may also be added as usual. In the following configuration, images pull attempts will first try the embedded mirror, then
<http://mirror.example.com|mirror.example.com>
, then finally `docker.io`:

```mirrors:

docker.io:

endpoint:

- https://mirror.example.com```

narrow-baker-33979

05/05/2025, 7:25 PM

First, thank you for your help. I have a private registry at use in my cluster. I have pushed a custom image to one of my nodes. Pulling metrics on Spegel indicates that it is aware of the image, implied by the image count under the registry listing - this is different from querying another node in the cluster. I then deploy a podspec with a nodeSelector for the pod to run on the node without the image in containerd. With this scenario, I am expecting, via Spegel, the image to pull from the neighboring node, but instead, the podspec is in ImagePullBackoff as the image only exists on one of the nodes and not in any other registry.

creamy-pencil-82913

05/05/2025, 7:28 PM

is your custom image named as being from a registry that is enabled for mirroring?

narrow-baker-33979

05/05/2025, 7:28 PM

Yes

creamy-pencil-82913

05/05/2025, 7:29 PM

can you show the image name as it appears in the pod spec, and your registries.yaml on the nodes? assuming you have the same registries.yaml on all nodes.

narrow-baker-33979

05/05/2025, 7:43 PM

Image used in the podspec is

<http://zot.k8s.example.com/example/customer-error-service:1.0.15|zot.k8s.example.com/example/customer-error-service:1.0.15>

narrow-baker-33979

05/05/2025, 7:44 PM

registries.yaml looks like:

Copy code

mirrors:
  <http://zot.k8s.example.com|zot.k8s.example.com>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-zot-k8s-example-com>
  <http://docker.elastic.co|docker.elastic.co>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-docker-elastic-co>
  <http://docker.io|docker.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-docker-io>
  <http://gcr.io|gcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-gcr-io>
  <http://ghcr.io|ghcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-ghcr-io>
  <http://k8s.gcr.io|k8s.gcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-k8s-gcr-io>
  <http://nvcr.io|nvcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-nvcr-io>
  <http://quay.io|quay.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-quay-io>
  <http://registry.k8s.io|registry.k8s.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-registry-k8s-io>
  <http://registry.opensource.zalan.do|registry.opensource.zalan.do>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-registry-opensource-zalan-do>

configs:
  "<http://harbor.k8s.example.com|harbor.k8s.example.com>":
    tls:
      insecure_skip_verify: False

narrow-baker-33979

05/05/2025, 7:48 PM

I see

/var/lib/rancher/rke2/agent/etc/containerd/certs.d/

contains a dir and config for 127.0.0.1:9345, which I assume is Spegel / "embedded-registry". In my

/etc/rancher/rke2/config.yaml

, I set

embedded-registry: true

creamy-pencil-82913

05/05/2025, 7:52 PM

if you run rke2 with

debug: true

you can see detailed logs from spegel and libp2p. Set that on both nodes, and then deploy your pod. Or just do

crictl pull <image>

on the other node to trigger an attempt to retrieve it.

creamy-pencil-82913

05/05/2025, 7:52 PM

do ensure that you have the same registries.yaml on all the nodes, and all nodes have been restarted since you enabled spegel.

narrow-baker-33979

05/05/2025, 7:53 PM

ok, will try this out

creamy-pencil-82913

05/05/2025, 7:54 PM

and just to confirm, the image appears on the seeding node’s

crictl image ls

<http://zot.k8s.example.com/example/customer-error-service:1.0.15|zot.k8s.example.com/example/customer-error-service:1.0.15>

- exact same as it is specified in your pod spec?

narrow-baker-33979

05/05/2025, 7:54 PM

yes. I will confirm this as well

creamy-pencil-82913

05/05/2025, 7:54 PM

are both of the nodes the same architecture?

narrow-baker-33979

05/05/2025, 7:55 PM

yes, all x86_64

narrow-baker-33979

05/05/2025, 8:38 PM

Here's an interesting msg. This was in

/var/log/messages

from the agent without the image.

x.x.x.79

is the server in the cluster:

Copy code

May  5 20:30:55 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:30:55.986Z#011DEBUG#011basichost#011basic/basic_host.go:340#011failed to fetch local IPv6 address#011{"error": "no route found for ::"}
May  5 20:30:57 ip-x-x-x-203 rke2[38610]: time="2025-05-05T20:30:57Z" level=debug msg="Wrote ping"
May  5 20:31:00 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:31:00.986Z#011INFO#011dht/RtRefreshManager#011rtrefresh/rt_refresh_manager.go:322#011starting refreshing cpl 0 with key CIQAAAAFZUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (
routing table size was 0)
May  5 20:31:00 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:31:00.986Z#011WARN#011dht/RtRefreshManager#011rtrefresh/rt_refresh_manager.go:233#011failed when refreshing routing table#011{"error": "2 errors occurred:\n\t* failed to query f
or self, err=failed to find any peer in table\n\t* failed to refresh cpl=0, err=failed to find any peer in table\n\n"}
May  5 20:31:00 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:31:00.986Z#011DEBUG#011basichost#011basic/basic_host.go:340#011failed to fetch local IPv6 address#011{"error": "no route found for ::"}
May  5 20:31:01 ip-x-x-x-203 rke2[38610]: time="2025-05-05T20:31:01Z" level=info msg="spegel 2025/05/05 20:31:01 p2p: \"msg\"=\"could not get bootstrap addresses\" \"error\"=\"CA cert validation failed: Get \\\"<https://ip-x-x-x-79.ec2.internal:9345/cacerts>\\\": tls: failed to verify certificate: x509: certificate is valid for ip-x-x-x-79, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, localhost, not ip-x-x-x-79.ec2.internal\""

creamy-pencil-82913

05/05/2025, 8:48 PM

yeah so something is odd with your network config

creamy-pencil-82913

05/05/2025, 8:50 PM

first of all you’re trying to use ipv6 but don’t have a default route on any of your interfaces so it can’t figure out what its primary ipv6 address should be

creamy-pencil-82913

05/05/2025, 8:51 PM

second, it looks like the node name does not match the node hostname? if it did the cert would have the correct hostname in its SAN list.

Copy code

certificate is valid for ip-x-x-x-79, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, localhost, not ip-x-x-x-79.ec2.internal

creamy-pencil-82913

05/05/2025, 8:51 PM

it is only valid for the short name, not the fqdn

creamy-pencil-82913

05/05/2025, 8:52 PM

I’m confused how this is working at all, if you get this error when trying to bootstrap spegel you should see the same error when trying to join the cluster.

creamy-pencil-82913

05/05/2025, 8:52 PM

1.28 is also VERY old and out of date though. We’ve changed quite a bit of stuff in this space as Spegel is fairly new and under active development.

creamy-pencil-82913

05/05/2025, 8:53 PM

Upgrade to a non-EOL version of RKE2 and see if this is still a problem

creamy-pencil-82913

05/05/2025, 8:53 PM

you should be on 1.30 at the oldest, and even that is about to go EOL now that 1.33 is out

narrow-baker-33979

05/05/2025, 9:28 PM

ok, will do

4 Views

Open in Slack

Previous Next