https://rancher.com/ logo
Join Slack
Powered by
# general
  • r

    rich-agent-52540

    06/20/2025, 12:59 PM
    hello! I ve been trying to run a k3d cluster managed by rancher locally on my windows 11 machine. I am initializing a cluster locally then importing it from a rancher server instance running locally as a container in docker desktop. When the cattle-system-agent is initialized, it is running but the cluster on rancher is stuck on provisioning. Here are the events of the agent pod:
    Events:                                                                                                                        Type     Reason          Age                   From               Message                                                    ----     ------          ----                  ----               -------                                                    Normal   Scheduled       44m                   default-scheduler  Successfully assigned cattle-system/cattle-cluster-agent-75f4675687-fk7tc to k3d-my-3node-cluster-server-0                                                                              Warning  BackOff         34m (x7 over 40m)     kubelet            Back-off restarting failed container cluster-register in pod cattle-cluster-agent-75f4675687-fk7tc_cattle-system(c189e097-516a-4759-b532-bce2d18e14dc)                                  Normal   Pulled          34m (x5 over 44m)     kubelet            Container image "rancher/rancher-agent:v2.11.2" already present on machine                                                                                                              Normal   Created         34m (x5 over 44m)     kubelet            Created container cluster-register                         Normal   Started         34m (x5 over 44m)     kubelet            Started container cluster-register                         Normal   SandboxChanged  6m36s                 kubelet            Pod sandbox changed, it will be killed and re-created.     Warning  BackOff         102s (x3 over 4m21s)  kubelet            Back-off restarting failed container cluster-register in pod cattle-cluster-agent-75f4675687-fk7tc_cattle-system(c189e097-516a-4759-b532-bce2d18e14dc)                                  Normal   Pulled          90s (x3 over 6m34s)   kubelet            Container image "rancher/rancher-agent:v2.11.2" already present on machine                                                                                                              Normal   Created         90s (x3 over 6m34s)   kubelet            Created container cluster-register                         Normal   Started         90s (x3 over 6m34s)   kubelet            Started container cluster-register
    I am adding two variables when initializing the import: CATTLE_SERVER = my rancher container address CATTLE_AGENT_VARIANTS = my rancher container address why is it stuck on provisioning?
  • c

    clever-beach-58636

    06/21/2025, 9:54 AM
    hi community, I'm experiencing an issue with the Rancher agent installation on my k8s cluster, where I'm unable to import the cluster into the Rancher server. The Rancher agent log is stuck at this point. Is anyone else experiencing the same issue as me?:
    Copy code
    k logs -n cattle-system cattle-cluster-agent-8cf84cc4-4k5jc 
    INFO: Environment: CATTLE_ADDRESS=100.96.0.51 CATTLE_CA_CHECKSUM=74167253d96634095c04ba77e894382c38ff506b6489191fff4158968b5b0400 CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=<tcp://100.64.12.75:80> CATTLE_CLUSTER_AGENT_PORT_443_TCP=<tcp://100.64.12.75:443> CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=100.64.12.75 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=<tcp://100.64.12.75:80> CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=100.64.12.75 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=100.64.12.75 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_CREDENTIAL_NAME=cattle-credentials-312345a2a1 CATTLE_INGRESS_IP_DOMAIN=<http://sslip.io|sslip.io> CATTLE_INSTALL_UUID=f1e2ebc5-2d22-494f-94fa-68b05d8a791f CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-8cf84cc4-4k5jc CATTLE_RANCHER_PROVISIONING_CAPI_VERSION=106.0.0+up0.7.0 CATTLE_RANCHER_WEBHOOK_VERSION=106.0.2+up0.7.2 CATTLE_SERVER=<https://10.103.101.184> CATTLE_SERVER_VERSION=v2.11.2
    INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 100.64.0.10 options ndots:5
    INFO: <https://10.103.101.184/ping> is accessible
    INFO: Value from <https://10.103.101.184/v3/settings/cacerts> is an x509 certificate
    time="2025-06-20T07:25:53Z" level=info msg="starting cattle-credential-cleanup goroutine in the background"
    time="2025-06-20T07:25:53Z" level=info msg="Rancher agent version v2.11.2 is starting"
    time="2025-06-20T07:25:53Z" level=info msg="Listening on /tmp/log.sock"
    time="2025-06-20T07:25:53Z" level=info msg="Testing connection to <https://10.103.101.184> using trusted certificate authorities within: /etc/kubernetes/ssl/certs/serverca"
    time="2025-06-20T07:25:53Z" level=info msg="Connecting to <wss://10.103.101.184/v3/connect/register> with token starting with m9bx96bl7g2zckwwaaaaaxzx1xs"
    time="2025-06-20T07:25:53Z" level=info msg="Connecting to proxy" url="<wss://10.103.101.184/v3/connect/register>"
    a
    • 2
    • 3
  • a

    adamant-traffic-5372

    06/22/2025, 11:08 PM
    Howdy folks! I've been trying to setup a local dev lab, specifically to practice Istio concepts. I wanted to do so with a realistic, yet lightweight composition of resources. Rancher Desktop and k3d to the rescue...or so I had hoped. But I've found myself stuck on what seems to me should be fairly trivial, but I can't seem to isolate the root cause and am hoping you folks could offer some guidance. Setup: • All resources are on a Windows host ◦ Windows 11 Pro ◦ 12 Core 3.60 GHz processor ◦ 64 GB RAM • Rancher Desktop installed: https://rancherdesktop.io/ • Rancher on Rancher Desktop using Helm installation pattern: https://docs.rancherdesktop.io/how-to-guides/rancher-on-rancher-desktop/ • k3d 3 node cluster: https://docs.rancherdesktop.io/how-to-guides/create-multi-node-cluster • Attempt to register said k3d cluster: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/register-existing-clusters#regis[…]luster ◦ Configured as Generic import with all defaults ◦ Copy the
    --insecure
    curl command since I'm using self-signed certs ◦ Run said command while context is set to k3d cluster Issue: •
    cattle-cluster-agent
    gets stuck in boot loop with the following error:
    Copy code
    INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 10.43.0.10 options ndots:5
    ERROR: <https://rancher.rd.localhost/ping> is not accessible (Failed to connect to rancher.rd.localhost port 443 after 0 ms: Couldn't connect to server)
    What I've tried: • Deployed a pod to the k3d cluster that runs
    infoblox/dnstools
    ◦ Pod is deployed to the
    cattle-system
    namespace ◦ With
    nodeName: <server-node>
    which is where the
    cattle-cluster-agent
    is deployed ◦ nslookup resolves
    rancher.rd.localhost
    to the Rancher Server ingress IP ◦
    curl --insecure <https://rancher.rd.localhost/ping>
    returns
    pong
    • Deployed a test agent pod to the k3d cluster that runs
    rancher/rancher-agent:v2.11.2
    ◦ Same results as above, except
    curl
    returns the same error as above ▪︎
    curl: (7) Failed to connect to rancher.rd.localhost port 443 after 0 ms: Couldn't connect to server
    Conclusion: I'm missing something fundamental, but I can't see it. DNS is resolving properly, but curl is unable to connect. I don't think it is a Cert problem, since it won't work with
    --insecure
    nor will it work with
    http vs https
    , yet both work in my dnstools pod. I don't think it can be firewall since, yet again it works in the dnstools pod which I think would have the exact same attributes as the agent pod from a network perspective. So there is something else that is trapping this, or I have a misunderstanding of DNS, Firewall, and Certificates. Any insight would be appreciated. P.S. the domain resolution maps to my Windows hosts file in all cases. In other words, if I change my hosts file, the resolution changed for all pods.
  • i

    important-architect-36609

    06/23/2025, 12:20 AM
    urgent need for help, production cluster down. I was trying to update rancher downsteam cluster from GUI and it was some how one of the etcd nodes stuck with draining node (after the two first nodes was upgraded whit out issues). in my attempt to get it back up I made a restore snapshot from GUI but know nothing is running 😞 and I dont know what to do
    a
    • 2
    • 3
  • s

    steep-baker-67062

    06/24/2025, 5:46 AM
    Rancher Permissions Issue We've encountered a problem in our Rancher environment: Previously, some users were assigned as Cluster Owners. I’ve removed their owner access and updated their permissions to only allow
    view
    ,
    watch
    , and
    list
    verbs. However, they’re still able to perform actions they had access to as owners (e.g., edit or delete resources), which shouldn’t be the case. Has anyone seen this behavior before or know if there’s a delay/cache in permission updates? Any help is appreciated. 🙏
    s
    • 2
    • 4
  • w

    worried-state-78253

    06/24/2025, 11:32 AM
    I've a curious thing happening in rancher when I add a worker to an existing cluster. It looks like the DNS servers are not being configured correctly on the new VM, this is rancher 2.11.2, harvester 1.4.1, RKE2 cluster provisioned with open-suse-micro vms. When rancher adds the new worker it cant get the CA certificate because the DNS isn't resolving -
    Copy code
    241.460390] cloud-init[2201]: [ERRORI
    000 received while downloading the CA certificate. Sleeping for 5 seconds and trying again
    246.469395] cloud-init[2201]: curl: (6) Could not resolve host: rancher.web
    246.470034] cloud-init[2201]: [ERROR]
    000 received while downloading the CA certificate. Sleeping for 5 seconds and trying again
    251.479164] cloud-init[2201]: curl: (6) Could not resolve host: rancher.web
    Now we have two dedicated DNS servers in the building, one in the harvester cluster as a VM and one bare metal, and the router advertises these as the DNS servers to use. I've tried editing the cloudinit to specify the DNS and that didn't help, so reverted that. Currently I now have 2 workers trying to start stuck in a create/destroy loop as they cant see the rancher installation. The hardware nodes can all see the DNS fine and resolve, the existing nodes in this target cluster can all see the DNS - but the newly provisioned nodes can not? I'm at a bit of a loss at present to see what is causing this.
    a
    • 2
    • 15
  • a

    adamant-traffic-5372

    06/24/2025, 6:38 PM
    I think I'm giving up on trying to manage a k3d cluster with Rancher on Rancher Desktop. I can't for the life of me understand why all other pods that I spin up in the k3d cluster are able to curl the rancher ping endpoint, but the rancher cattle agent cannot. If anyone else if having this problem trying to import a k3d cluster, and figure out the resolution, let me know. My original breakdown: https://rancher-users.slack.com/archives/C3ASABBD1/p1750633692812039
  • d

    delightful-kitchen-63100

    06/24/2025, 7:30 PM
    The internal DNS cannot resolve the url (https://rancher.rd.localhost/ping) because the Rancher`s DNS file (resolv.conf) can only resolve these addresses: cattle-system.svc.cluster.local svc.cluster.local cluster.local But your issue is before Rancher`s, if not even curl can resolve the address for rancher.rd.localhost (443) Rancher will not be able to resolve neither. Can you check if you can ping the URL from your agent node?
    ping rancher.rd.localhost
    also, try to telnet the IP/port from the agent node to the Rancher`s ingress IP?
    telnet <ingress IP> 443
    See if the IP is being resolved by ping and the packages are being delivered, if you don`t know how to read the result of ping, is expected something like this:
    Copy code
    ping ip-xxx.sa-east-1.compute.internal
    PING ip-xxx.sa-east-1.compute.internal (xxx) 56(84) bytes of data.
    64 bytes from ip-xxx.sa-east-1.compute.internal (xxx): icmp_seq=1 ttl=64 time=0.031 ms
    64 bytes from ip-xxx.sa-east-1.compute.internal (xxx): icmp_seq=2 ttl=64 time=0.042 ms
    ^C
    --- ip-xxx.sa-east-1.compute.internal ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1035ms
    rtt min/avg/max/mdev = 0.031/0.036/0.042/0.005 ms
    a
    • 2
    • 4
  • a

    adamant-traffic-5372

    06/24/2025, 10:42 PM
    There is no ping on the rancher agent. But
    nslookup
    resolves without issue. DNS is not having a problem. There is something with the network from only the rancher agent container. When I spin up a generic pod, I can
    curl
    the endpoint without issue. But inside the Rancher agent pod, no dice.
    • 1
    • 1
  • f

    future-fountain-82544

    06/26/2025, 12:12 AM
    Rancher / Rancher-Monitoring question. We have a cluster we've upgraded through like 1.25 through 1.32 now. Rancher-Monitoring wound up not getting upgraded along the way and it's currently on 103.1.1+up45.31.1 and I'm attempting to upgrade to 106.1.2+up69.8.2-rancher.7 and I'm running into the following error:
    Copy code
    Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "rancher-monitoring-prometheus-node-exporter" namespace: "cattle-monitoring-system" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
    Somewhere along the way of upgrading k3s, I want to say that the last remaining PodSecurityPolicy may have been deleted manually or by the process of upgrading k3s. I think helm is trying to find them to remove them and is of course unsuccessful. Anyone run into this with helm and know how to resolve this? More details will be in-thread
    • 1
    • 4
  • c

    crooked-cat-21365

    06/26/2025, 7:39 AM
    I have installed the rancher-logging app (106.0.2+up4.10.0-rancher.6, via the Rancher GUI) on RKE2 with 4 worker nodes. Problem: fluent-bit seems to die with an OOM about 450 times per day. The fluentbit version included in rancher-logging is 3.1.8. Is it possible it cannot handle cgroupv2 yet?
    Copy code
    [Thu Jun 26 09:59:41 2025] flb-pipeline invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
    [Thu Jun 26 09:59:41 2025] CPU: 2 PID: 2291474 Comm: flb-pipeline Not tainted 6.1.0-29-amd64 #1  Debian 6.1.123-1
    [Thu Jun 26 09:59:41 2025] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.22.2 09/12/2024
    [Thu Jun 26 09:59:41 2025] Call Trace:
    [Thu Jun 26 09:59:41 2025]  <TASK>
    [Thu Jun 26 09:59:41 2025]  dump_stack_lvl+0x44/0x5c
    [Thu Jun 26 09:59:41 2025]  dump_header+0x4a/0x211
    [Thu Jun 26 09:59:41 2025]  oom_kill_process.cold+0xb/0x10
    [Thu Jun 26 09:59:41 2025]  out_of_memory+0x1fd/0x4c0
    [Thu Jun 26 09:59:41 2025]  mem_cgroup_out_of_memory+0x134/0x150
    [Thu Jun 26 09:59:41 2025]  try_charge_memcg+0x696/0x780
    [Thu Jun 26 09:59:41 2025]  charge_memcg+0x39/0xf0
    [Thu Jun 26 09:59:41 2025]  __mem_cgroup_charge+0x28/0x80
    [Thu Jun 26 09:59:41 2025]  __handle_mm_fault+0x95c/0xfa0
    [Thu Jun 26 09:59:41 2025]  handle_mm_fault+0xdb/0x2d0
    [Thu Jun 26 09:59:41 2025]  do_user_addr_fault+0x191/0x550
    [Thu Jun 26 09:59:41 2025]  exc_page_fault+0x70/0x170
    [Thu Jun 26 09:59:41 2025]  asm_exc_page_fault+0x22/0x30
    [Thu Jun 26 09:59:41 2025] RIP: 0033:0x7f3c1976ef4c
    [Thu Jun 26 09:59:41 2025] Code: 00 00 00 74 a0 83 f9 c0 0f 87 56 fe ff ff 62 e1 fe 28 6f 4e 01 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 <f3> a4 62 c1 fe 28 7f 00 62 c1 fe 28 7f 48 01 c3 0f 1f 40 00 4c 8b
    [Thu Jun 26 09:59:41 2025] RSP: 002b:00007f3c181fa3c8 EFLAGS: 00010206
    [Thu Jun 26 09:59:41 2025] RAX: 00007f3c0dfb01aa RBX: 00000000001e0000 RCX: 000000000000e84b
    [Thu Jun 26 09:59:41 2025] RDX: 00000000000356a1 RSI: 00007f3c1177a196 RDI: 00007f3c0dfd7000
    [Thu Jun 26 09:59:41 2025] RBP: 00000000000356a1 R08: 00007f3c0dfb01aa R09: 0000000000400000
    [Thu Jun 26 09:59:41 2025] R10: 00000000001c0000 R11: 0000000000000048 R12: 00007f3c16169f40
    [Thu Jun 26 09:59:41 2025] R13: 00007f3c11753340 R14: 00007f3c1604db80 R15: 00007f3c120da740
    [Thu Jun 26 09:59:41 2025]  </TASK>
    [Thu Jun 26 09:59:41 2025] memory: usage 97656kB, limit 97656kB, failcnt 4194
    [Thu Jun 26 09:59:41 2025] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
    [Thu Jun 26 09:59:41 2025] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8b0a4ee5_d9cb_4ae3_876c_df15c8b305b5.slice:
    [Thu Jun 26 09:59:41 2025] anon 97992704
                               file 4096
                               kernel 2002944
                               kernel_stack 147456
                               pagetables 356352
                               sec_pagetables 0
                               percpu 730296
                               sock 0
                               vmalloc 12288
                               shmem 0
                               zswap 0
                               zswapped 0
                               file_mapped 0
                               file_dirty 0
                               file_writeback 0
                               swapcached 0
                               anon_thp 46137344
                               file_thp 0
                               shmem_thp 0
                               inactive_anon 97984512
                               active_anon 8192
                               inactive_file 0
                               active_file 4096
                               unevictable 0
                               slab_reclaimable 250784
                               slab_unreclaimable 398248
                               slab 649032
                               workingset_refault_anon 0
                               workingset_refault_file 381
                               workingset_activate_anon 0
                               workingset_activate_file 1
                               workingset_restore_anon 0
                               workingset_restore_file 0
                               workingset_nodereclaim 26
                               pgscan 5984
                               pgsteal 3693
                               pgscan_kswapd 0
                               pgscan_direct 5984
                               pgsteal_kswapd 0
                               pgsteal_direct 3693
                               pgfault 1380514
                               pgmajfault 19
                               pgrefill 2120
                               pgactivate 2301
                               pgdeactivate 2120
                               pglazyfree 0
                               pglazyfreed 0
                               zswpin 0
                               zswpout 0
                               thp_fault_alloc 153
                               thp_collapse_alloc 0
    [Thu Jun 26 09:59:41 2025] Tasks state (memory values in pages):
    [Thu Jun 26 09:59:41 2025] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
    [Thu Jun 26 09:59:41 2025] [2231365] 65535 2231365      243        1    28672        0          -998 pause
    [Thu Jun 26 09:59:41 2025] [2291372]     0 2291372    52297    27372   348160        0           999 fluent-bit
    [Thu Jun 26 09:59:41 2025] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-d4c669550fc25bc650c28f72b7bad4d279f0f68a94c79d7c9ccb729f2b83e20d.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8b0a4ee5_d9cb_4ae3_876c_df15c8b305b5.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8b0a4ee5_d9cb_4ae3_876c_df15c8b305b5.slice/cri-containerd-d4c669550fc25bc650c28f72b7bad4d279f0f68a94c79d7c9ccb729f2b83e20d.scope,task=fluent-bit,pid=2291372,uid=0
    [Thu Jun 26 09:59:41 2025] Memory cgroup out of memory: Killed process 2291372 (fluent-bit) total-vm:209188kB, anon-rss:95464kB, file-rss:14024kB, shmem-rss:0kB, UID:0 pgtables:340kB oom_score_adj:999
    [Thu Jun 26 09:59:41 2025] Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8b0a4ee5_d9cb_4ae3_876c_df15c8b305b5.slice/cri-containerd-d4c669550fc25bc650c28f72b7bad4d279f0f68a94c79d7c9ccb729f2b83e20d.scope are going to be killed due to memory.oom.group set
    [Thu Jun 26 09:59:41 2025] Memory cgroup out of memory: Killed process 2291474 (flb-pipeline) total-vm:209188kB, anon-rss:95508kB, file-rss:14024kB, shmem-rss:0kB, UID:0 pgtables:340kB oom_score_adj:999
    c
    f
    • 3
    • 10
  • i

    important-architect-36609

    06/26/2025, 9:37 AM
    does anybody know when Rocky 10 will be officially supported for downstream clusters?
    w
    • 2
    • 1
  • b

    brash-waitress-85312

    06/26/2025, 1:41 PM
    I am stuck for a week now. I am trying to recover my cluster using DR guide. I am unable to add any node to my rke2 cluster which is managed by rancher but registration command is stuck after
    Generating Cattle id
    My question is: Is it compulsory to have a working CP node in order to register a node in the cluster using rancher registration token?
    f
    • 2
    • 19
  • a

    abundant-chef-15101

    06/26/2025, 6:07 PM
    Hello! Im running rancher desktop 1.19.3 on a macos 15.5 and it is running out of CPU for phpfpm
    Copy code
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root      3104 99.9  0.2 587884 64304 ?        Rsl  17:59   0:10 /usr/bin/qemu-x86_64 /usr/sbin/php-fpm8.4 php-fpm8.4 -F
    This is within the the docker container i use within rancher - not sure how to prevent it from eating up all the CPU
    c
    • 2
    • 2
  • t

    thousands-breakfast-69764

    06/30/2025, 2:48 PM
    Hi I’m lost in the Resource cpu reservation in k8s. I’m trying to figure out how many cores my rancher managed cluster should have. Running RKE2 and only have 1 Statefulset installed (a tomcat application that uses an external database). I have reserved 2000mCPUs for that Statefulset, I have found 3 Deploymentset that totally have 100mCPUs each reserved (rke2-coredns-rke2-coredns, rke2-coredns-rke2-coredns-autoscaler and rke2-metrics-server) So totally 2300mCPUs is reserved. My 3 node RKE2 cluster has 6 cores (4,89 reserved and 1,71 used). If I read the docs right 2300mCPUs would be equal to 2.3 physical CPU cores ore virtual cores“1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core” So, what’s reserving the rest 2,59 cores?
  • a

    able-printer-16910

    06/30/2025, 2:56 PM
    Hello team, I need official support for a Rancher issue. There's a production incident, very critical and needs to be addressed ASAP. Where can I find urgent assistance?
    t
    • 2
    • 1
  • s

    square-rose-18388

    06/30/2025, 4:27 PM
    Just did a Migration/restore from a docker deployment (That died but thankfully I had rancher backup setup) of rancher to a 3 node k3s cluster. It worked. No issues. Thanks for the painless migration/restoration process rancher team.
    🙌 1
  • p

    purple-rose-91888

    07/01/2025, 8:02 AM
    Hi, when will the version v1.32.6+k3s1 be added to the stable release channel? https://update.k3s.io/v1-release/channels We're currently waiting to migrate from mysqlite to etcd datastore, but are affected by the following issue: https://github.com/k3s-io/k3s/issues/12478 Thanks!
    c
    • 2
    • 4
  • c

    creamy-tailor-26997

    07/01/2025, 7:24 PM
    Hello! I want to connect remotely to the docker daemon, I am trying to update the daemon.json file using a provisioning script to allow that connection
  • c

    creamy-tailor-26997

    07/01/2025, 7:24 PM
    nothing fancy
  • c

    creamy-tailor-26997

    07/01/2025, 7:24 PM
    #!/bin/sh cat <<EOF > /etc/docker/daemon.json { "features": { "containerd-snapshotter": false }, "hosts": ["unix:///var/run/docker.sock", "tcp://0.0.0.0:2375"] } EOF
  • c

    creamy-tailor-26997

    07/01/2025, 7:24 PM
    no luck so far, has anyone a setup like that ?
  • f

    faint-soccer-70506

    07/01/2025, 7:31 PM
    How does one set a custom grafana dashboard to be added to new project monitor grafana's? Similarly, is there a mechanism to add said dashboards to existing project monitors other than copy-paste? Just adding it to the cattle-dashboard namespace like you would for the global rancher-monitoring grafana doesn't appear to accomplish this, and I don't see anything in the docs about it. (sorry if there's a better channel to ask this, couldn't find one)
  • r

    red-intern-36131

    07/02/2025, 7:10 AM
    Hello! We migrated Rancher from an eks cluster to one of the downstream clusters in vSphere with rke2 so that Rancher could manage itself. We decommissioned the rancher instance from the eks cluster and now only the one we migrated to is up and running. The only problem we have is that the nodes for that downstream cluster appear outdated/unavailable even though they are functioning. And if we provision new nodes for this cluster, they remain in Provisioned status. Do you have any idea what could be causing this or what we are missing? Thank you very much!
  • b

    boundless-scientist-9417

    07/02/2025, 5:04 PM
    Hi, I am currently using the mongodb-sharded:7.0.6-debian-12-r0 image with my rke2 deployment and I want to use the same version with Distroless image, how can I get it?
    m
    • 2
    • 1
  • f

    flat-spring-34799

    07/02/2025, 8:52 PM
    Is there a way to restore a cluster that was scaled to 0 with only the etcd backups? and the various yamls that are on the local still of course
    c
    • 2
    • 8
  • b

    bland-easter-37603

    07/03/2025, 7:19 AM
    Running k3s with docker container and systemctl restart docker breaks the k3s server. Only way to rescue it is by recreating container.
    I0703 07:15:09.113085      17 factory.go:221] Registration of the crio container factory failed: Get "<http://%2Fvar%2Frun%2Fcrio%2Fcrio.sock/info>": dial unix /var/run/crio/crio.sock: connect: no such file or directory
    I0703 07:15:09.124398      17 kubelet_network_linux.go:49] "Initialized iptables rules." protocol="IPv4"
    I0703 07:15:09.127578      17 kubelet_network_linux.go:49] "Initialized iptables rules." protocol="IPv6"
    I0703 07:15:09.127611      17 status_manager.go:230] "Starting to sync pod status with apiserver"
    I0703 07:15:09.127634      17 watchdog_linux.go:127] "Systemd watchdog is not enabled or the interval is invalid, so health checking will not be started."
    I0703 07:15:09.127641      17 kubelet.go:2436] "Starting kubelet main sync loop"
    E0703 07:15:09.127698      17 kubelet.go:2460] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
    time="2025-07-03T07:15:09Z" level=info msg="Applying CRD <http://addons.k3s.cattle.io|addons.k3s.cattle.io>"
    I0703 07:15:09.149273      17 factory.go:223] Registration of the containerd container factory successfully
    time="2025-07-03T07:15:09Z" level=info msg="Flannel found PodCIDR assigned for node 7343353f304f"
    time="2025-07-03T07:15:09Z" level=info msg="The interface eth0 with ipv4 address 172.24.0.4 will be used by flannel"
    I0703 07:15:09.210568      17 kube.go:139] Waiting 10m0s for node controller to sync
    I0703 07:15:09.210689      17 kube.go:469] Starting kube subnet manager
    I0703 07:15:09.216454      17 cpu_manager.go:221] "Starting CPU manager" policy="none"
    I0703 07:15:09.216571      17 cpu_manager.go:222] "Reconciling" reconcilePeriod="10s"
    I0703 07:15:09.216597      17 state_mem.go:36] "Initialized new in-memory state store"
    I0703 07:15:09.216762      17 state_mem.go:88] "Updated default CPUSet" cpuSet=""
    I0703 07:15:09.216967      17 state_mem.go:96] "Updated CPUSet assignments" assignments={}
    I0703 07:15:09.216984      17 policy_none.go:49] "None policy: Start"
    I0703 07:15:09.216996      17 memory_manager.go:186] "Starting memorymanager" policy="None"
    I0703 07:15:09.217005      17 state_mem.go:35] "Initializing new in-memory state store"
    I0703 07:15:09.217099      17 state_mem.go:75] "Updated machine memory state"
    E0703 07:15:09.219087      17 manager.go:517] "Failed to read data from checkpoint" err="checkpoint is not found" checkpoint="kubelet_internal_checkpoint"
    I0703 07:15:09.219438      17 eviction_manager.go:189] "Eviction manager: starting control loop"
    I0703 07:15:09.219559      17 container_log_manager.go:189] "Initializing container log rotate workers" workers=1 monitorPeriod="10s"
    I0703 07:15:09.222575      17 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24]
    I0703 07:15:09.227004      17 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
    I0703 07:15:09.244032      17 server.go:715] "Successfully retrieved node IP(s)" IPs=["172.24.0.2"]
    E0703 07:15:09.244440      17 server.go:245] "Kube-proxy configuration may be incomplete or incorrect" err="nodePortAddresses is unset; NodePort connections will be accepted on all local IPs. Consider using
    --nodeport-addresses primary`"`
    I0703 07:15:09.247644      17 server.go:254] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
    I0703 07:15:09.247856      17 server_linux.go:145] "Using iptables Proxier"
    I0703 07:15:09.270383      17 proxier.go:243] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
    I0703 07:15:09.270940      17 server.go:516] "Version info" version="v1.33.2+k3s1"
    I0703 07:15:09.271050      17 server.go:518] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
    I0703 07:15:09.277080      17 config.go:199] "Starting service config controller"
    I0703 07:15:09.277265      17 shared_informer.go:350] "Waiting for caches to sync" controller="service config"
    I0703 07:15:09.277368      17 config.go:105] "Starting endpoint slice config controller"
    I0703 07:15:09.277453      17 shared_informer.go:350] "Waiting for caches to sync" controller="endpoint slice config"
    I0703 07:15:09.277525      17 config.go:440] "Starting serviceCIDR config controller"
    I0703 07:15:09.277597      17 shared_informer.go:350] "Waiting for caches to sync" controller="serviceCIDR config"
    I0703 07:15:09.278390      17 config.go:329] "Starting node config controller"
    I0703 07:15:09.278525      17 shared_informer.go:350] "Waiting for caches to sync" controller="node config"
    E0703 07:15:09.292754      17 eviction_manager.go:267] "eviction manager: failed to check if we have separate container filesystem. Ignoring." err="no imagefs label for configured runtime"
    I0703 07:15:09.318748      17 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="6b216c0f025b0efd2879ada393fc77bc808594146a1aa3759b37063c0103031c"
    I0703 07:15:09.353082      17 kubelet_node_status.go:75] "Attempting to register node" node="7343353f304f"
    I0703 07:15:09.391194      17 shared_informer.go:357] "Caches are synced" controller="node config"
    I0703 07:15:09.391310      17 shared_informer.go:357] "Caches are synced" controller="service config"
    I0703 07:15:09.391377      17 shared_informer.go:357] "Caches are synced" controller="endpoint slice config"
    I0703 07:15:09.394486      17 shared_informer.go:357] "Caches are synced" controller="serviceCIDR config"
    I0703 07:15:09.427079      17 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="a3534178e457d4f53f49bd0e60b9b322987b1ac7c3781a198dd067d12fc036a4"
    time="2025-07-03T07:15:09Z" level=info msg="Applying CRD <http://etcdsnapshotfiles.k3s.cattle.io|etcdsnapshotfiles.k3s.cattle.io>"
    time="2025-07-03T07:15:09Z" level=fatal msg="Failed to start networking: unable to initialize network policy controller: error getting node subnet: failed to find interface with specified node ip"
    c
    • 2
    • 5
  • m

    most-memory-45748

    07/03/2025, 9:54 AM
    I get a bunch of 403 errors when trying to deploy something in my cluster via a Azure devops pipeline with a Kubernetes service connection. Previously it worked just fine. I have a roletemplate which looks like this:
    Copy code
    /usr/local/bin/kubectl get pods -n cattle-monitoring-system
    
    
    ##[error]E0703 11:02:37.026548 3323374 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: {\"Code\":{\"Code\":\"Forbidden\",\"Status\":403},\"Message\":\"<http://clusters.management.cattle.io|clusters.management.cattle.io> \\\"c-m-7m9crrsz\\\" is forbidden: User \\\"system:unauthenticated\\\" cannot get resource \\\"clusters\\\" in API group \\\"<http://management.cattle.io|management.cattle.io>\\\" at the cluster scope\",\"Cause\":null,\"FieldName\":\"\"}"
  • s

    steep-petabyte-14152

    07/03/2025, 4:45 PM
    Good morning I'm seeing a question in Rancher since I'm a new user. When I create a project and associate it with a namespace, I'm seeing that a namespace with a generic name is created in my cluster. Is this normal in this Rancher process?
    s
    • 2
    • 2
  • b

    bright-address-70198

    07/04/2025, 10:18 AM
    Hi Everyone, I would like to get a help How can I programmatically install a Helm chart using catalog.cattle.io.app in a Rancher extension? I’m using this.$store.dispatch('cluster/create', ...), but the chart doesn’t seem to deploy. Any working example or doc link would help.
    s
    • 2
    • 1