Hello All! I'm on 1.28. Does Spegel - the embedd...
# rke2
n
Hello All! I'm on 1.28. Does Spegel - the embedded registry, integrated with k3d perform gPRC image sharing between nodes in RKE2? I have a strong feeling that it does not, but only functions as a pull-through cache. Any advise?
c
I’m not sure what you mean “grpc image sharing”
it acts similar to a pull-through cache, except the backing store is the containerd image store on other nodes
it does not proactively push content between nodes.
you can read the docs in the spegel project repo for more info
n
Thank you for your reply. I guess what I am saying is that I do not see it working. I have an image on one node and am expecting a pod deployed explicitly to another node to start up, having the image discovered and pulled through via Spegel / peer-2-peer. Have you had any trouble with this feature?
c
No, generally it just works, if you enable it as covered in the docs.
What makes you think it's not using spegel? How have you configured your nodes?
n
I think the problem is that we also have Harbor listed as a mirror. Although Spegel is listed first, It just seems as though the listing order does not set precedence. I also ran across this post: https://github.com/spegel-org/spegel/issues/277
c
Although Spegel is listed first, It just seems as though the listing order does not set precedence.
That is not the case. Mirrors are tried in the order listed, and spegel is always tried first. I’ll ask again - what makes you think that it’s not working?
This is covered in the RKE2 docs… https://docs.rke2.io/install/registry_mirror#enabling-registry-mirroring
Endpoints for registry mirrors may also be added as usual. In the following configuration, images pull attempts will first try the embedded mirror, then
<http://mirror.example.com|mirror.example.com>
, then finally `docker.io`:
```mirrors:
docker.io:
endpoint:
- https://mirror.example.com```
n
First, thank you for your help. I have a private registry at use in my cluster. I have pushed a custom image to one of my nodes. Pulling metrics on Spegel indicates that it is aware of the image, implied by the image count under the registry listing - this is different from querying another node in the cluster. I then deploy a podspec with a nodeSelector for the pod to run on the node without the image in containerd. With this scenario, I am expecting, via Spegel, the image to pull from the neighboring node, but instead, the podspec is in ImagePullBackoff as the image only exists on one of the nodes and not in any other registry.
c
is your custom image named as being from a registry that is enabled for mirroring?
n
Yes
c
can you show the image name as it appears in the pod spec, and your registries.yaml on the nodes? assuming you have the same registries.yaml on all nodes.
n
Image used in the podspec is
<http://zot.k8s.example.com/example/customer-error-service:1.0.15|zot.k8s.example.com/example/customer-error-service:1.0.15>
registries.yaml looks like:
Copy code
mirrors:
  <http://zot.k8s.example.com|zot.k8s.example.com>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-zot-k8s-example-com>
  <http://docker.elastic.co|docker.elastic.co>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-docker-elastic-co>
  <http://docker.io|docker.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-docker-io>
  <http://gcr.io|gcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-gcr-io>
  <http://ghcr.io|ghcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-ghcr-io>
  <http://k8s.gcr.io|k8s.gcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-k8s-gcr-io>
  <http://nvcr.io|nvcr.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-nvcr-io>
  <http://quay.io|quay.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-quay-io>
  <http://registry.k8s.io|registry.k8s.io>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-registry-k8s-io>
  <http://registry.opensource.zalan.do|registry.opensource.zalan.do>:
    endpoint:
    - <https://harbor.k8s.example.com/v2/proxy-registry-opensource-zalan-do>

configs:
  "<http://harbor.k8s.example.com|harbor.k8s.example.com>":
    tls:
      insecure_skip_verify: False
I see
/var/lib/rancher/rke2/agent/etc/containerd/certs.d/
contains a dir and config for 127.0.0.1:9345, which I assume is Spegel / "embedded-registry". In my
/etc/rancher/rke2/config.yaml
, I set
embedded-registry: true
c
if you run rke2 with
debug: true
you can see detailed logs from spegel and libp2p. Set that on both nodes, and then deploy your pod. Or just do
crictl pull <image>
on the other node to trigger an attempt to retrieve it.
do ensure that you have the same registries.yaml on all the nodes, and all nodes have been restarted since you enabled spegel.
n
ok, will try this out
c
and just to confirm, the image appears on the seeding node’s
crictl image ls
as
<http://zot.k8s.example.com/example/customer-error-service:1.0.15|zot.k8s.example.com/example/customer-error-service:1.0.15>
- exact same as it is specified in your pod spec?
n
yes. I will confirm this as well
c
are both of the nodes the same architecture?
n
yes, all x86_64
Here's an interesting msg. This was in
/var/log/messages
from the agent without the image.
x.x.x.79
is the server in the cluster:
Copy code
May  5 20:30:55 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:30:55.986Z#011DEBUG#011basichost#011basic/basic_host.go:340#011failed to fetch local IPv6 address#011{"error": "no route found for ::"}
May  5 20:30:57 ip-x-x-x-203 rke2[38610]: time="2025-05-05T20:30:57Z" level=debug msg="Wrote ping"
May  5 20:31:00 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:31:00.986Z#011INFO#011dht/RtRefreshManager#011rtrefresh/rt_refresh_manager.go:322#011starting refreshing cpl 0 with key CIQAAAAFZUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (
routing table size was 0)
May  5 20:31:00 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:31:00.986Z#011WARN#011dht/RtRefreshManager#011rtrefresh/rt_refresh_manager.go:233#011failed when refreshing routing table#011{"error": "2 errors occurred:\n\t* failed to query f
or self, err=failed to find any peer in table\n\t* failed to refresh cpl=0, err=failed to find any peer in table\n\n"}
May  5 20:31:00 ip-x-x-x-203 rke2[38610]: 2025-05-05T20:31:00.986Z#011DEBUG#011basichost#011basic/basic_host.go:340#011failed to fetch local IPv6 address#011{"error": "no route found for ::"}
May  5 20:31:01 ip-x-x-x-203 rke2[38610]: time="2025-05-05T20:31:01Z" level=info msg="spegel 2025/05/05 20:31:01 p2p: \"msg\"=\"could not get bootstrap addresses\" \"error\"=\"CA cert validation failed: Get \\\"<https://ip-x-x-x-79.ec2.internal:9345/cacerts>\\\": tls: failed to verify certificate: x509: certificate is valid for ip-x-x-x-79, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, localhost, not ip-x-x-x-79.ec2.internal\""
c
yeah so something is odd with your network config
first of all you’re trying to use ipv6 but don’t have a default route on any of your interfaces so it can’t figure out what its primary ipv6 address should be
second, it looks like the node name does not match the node hostname? if it did the cert would have the correct hostname in its SAN list.
Copy code
certificate is valid for ip-x-x-x-79, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, localhost, not ip-x-x-x-79.ec2.internal
it is only valid for the short name, not the fqdn
I’m confused how this is working at all, if you get this error when trying to bootstrap spegel you should see the same error when trying to join the cluster.
1.28 is also VERY old and out of date though. We’ve changed quite a bit of stuff in this space as Spegel is fairly new and under active development.
Upgrade to a non-EOL version of RKE2 and see if this is still a problem
you should be on 1.30 at the oldest, and even that is about to go EOL now that 1.33 is out
n
ok, will do