ambitious-plastic-3551
03/04/2024, 4:40 PMbillions-accountant-52971
03/04/2024, 4:40 PMbillions-accountant-52971
03/04/2024, 4:40 PMmany-nightfall-61858
03/04/2024, 11:27 PMauthorization failed: no basic auth credentials
purple-australia-63663
03/05/2024, 8:48 AM1.27.11~rke2r1-0
1.28.7~rke2r1-0
yum package versions have known bugs?purple-australia-63663
03/05/2024, 8:48 AMbored-area-23524
03/05/2024, 11:45 AMbrave-toddler-43854
03/05/2024, 3:36 PM[ERROR] 000 received while testing Rancher connection. Sleeping for 5 seconds and trying again
curl: (60) SSL certificate problem: self signed certificate
thanks for your help in advacne!little-doctor-70130
03/06/2024, 12:31 PMrancher-federal/rke2/hardened-kubernetes
, rancher-federal/rke2/etcd
, rancher-federal/rke2/rke2-runtime
, etc.). A lot are identified as having no fix, but I also wonder if it's even feasible to fix the outdated packages by building our own images off these as a base and updating package versions - I worry we may just end up breaking the images entirely.
Does anyone have any guidance, suggestions, or experience with this? I will need to be submitting scan information to an internal security team who are likely to want justification on any High/Critical vulnerabilities in the scans that we can't or won't fix - this will be an enormous task with the potentially dozens of container images in use by the control plane and core functionality.narrow-waiter-80694
03/06/2024, 1:03 PMgreat-action-21630
03/07/2024, 1:25 AM<http://docker.io|docker.io>
, <http://quay.io|quay.io>
, etc
These are air-gapped nodes and our registries.yaml
file has been working great. Once we upgraded to RKE2 v1.27.10+rke2r1 for new installs we noticed that during stand up some of the cluster we were getting ImagePullBackOff
errors that would never resolve themselves.
running crictl info
on nodes running RKE2 v1.24.11+rke2r1 we see our mirrors listed under
"registry": {
"configPath": "",
"mirrors": {
"<http://docker.io|docker.io>": {
"endpoint": [
"<https://harbor>.*.*",
"<https://harbor.2>.*.*",
"<https://harboe.3>.*.*"
],
"rewrite": null
},
"<http://ghcr.io|ghcr.io>": {
"endpoint": [
"<https://harbor>.*.*",
"<https://harbor.2>.*.*",
"<https://harboe.3>.*.*"
],
"rewrite": null
},
however the RKE2 v1.27.10+rke2r1 nodes have
"*registry*": {
"configPath": "/var/lib/rancher/rke2/agent/etc/containerd/certs.d",
"mirrors": null,
"configs": null,
"auths": null,
"headers": null
},
Exact same registries.yaml
file in the same location.
Checking journalctl
for rke2-server
on the 1.27 clusters we see: level=info msg="Using private registry config file at /etc/rancher/rke2/registries.yaml"
with no errors listed after. Deleted pod many times and each time it comes back it can't pull the image.
Doing a describe on the pod, it says it can't pull from docker.io or whatever the repo is. Can't connect
Anyone have ideas, or things to check?quick-mechanic-78526
03/07/2024, 12:50 PMshy-megabyte-75492
03/08/2024, 1:34 PMripe-intern-96703
03/08/2024, 2:00 PMswift-sunset-4572
03/08/2024, 6:40 PMcalm-river-27740
03/12/2024, 4:37 AMcurved-piano-98970
03/12/2024, 8:27 AMcurved-piano-98970
03/12/2024, 8:27 AMcurved-piano-98970
03/12/2024, 8:51 AM--audit-policy-file=/var/lib/rancher/rke2/etc/config-files/audit-policy-file --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 --audit-log-path=/var/lib/rancher/rke2/server/logs/audit.log
How do i modify this? in rke2? can I do it?miniature-notebook-6405
03/12/2024, 2:19 PMlimited-motherboard-41807
03/13/2024, 12:05 PMNotReady
.
At midnight, for some reason, something happens and we see following errors in the logs (of the rke2-agent
service):
level=info msg="Connecting to proxy" url="<wss://10.76.116.29:9345/v1-rke2/connect>"
level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 10.76.116.29:9345: connect: connection refused"
level=error msg="Remotedialer proxy error" error="dial tcp 10.76.116.29:9345: connect: connection refused"
Restarting the agents makes it work back perfectly fine.
I have no clue what could be the reason and why doesn't it "self heal"?
Does anyone have any idea of what/where to investigate more?miniature-piano-74169
03/13/2024, 12:40 PMdisable-cloud-controller: true
cloud_provider:
name: external-aws
use_instance_metadata_hostname: true
kubelet-arg:
- cloud-provider=external
- provider-id=aws://<$provider_id>
kube-apiserver-arg: cloud-provider=external
kube-controller-manager-arg: cloud-provider=external
2. Is there a reference for the options available to use in cloud.conf?miniature-notebook-6405
03/15/2024, 3:26 PMprehistoric-zoo-94477
03/15/2024, 4:22 PMMar 15 16:16:00 ber3-230434 rke2[47278]: time="2024-03-15T16:16:00Z" level=info msg="Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
the only solution is to reboot the system. I tried the rke2-killall.sh and rke2-uninstall.sh, but still this isn't enough for bringing containerd in a clean state. any thoughts here?freezing-parrot-4783
03/15/2024, 7:08 PMorange-breakfast-78817
03/15/2024, 7:48 PMrefined-autumn-58375
03/17/2024, 3:29 PMwooden-area-49191
03/18/2024, 7:35 AMcurved-piano-98970
03/18/2024, 10:09 AMrefined-autumn-58375
03/18/2024, 1:50 PMNon-ready bootstrap machine bcfd5cfbb-2pj9w and join url to be available on bootstrap nod