adamant-kite-43734
12/19/2023, 10:13 PMcreamy-pencil-82913
12/19/2023, 10:16 PMabundant-hair-58573
12/19/2023, 10:21 PMcrictl ps -a
on the controlplane shows the aws-cloud-controller-manager with a bunch of failed restarts, last line in the log shows that I must have an old tag lying around somehere
Cloud provider could not be initialized: could not init cloud provider "aws": Found multiple cluster tags with prefix <http://kubernetes.io/cluster/|kubernetes.io/cluster/>
creamy-pencil-82913
12/19/2023, 10:22 PMabundant-hair-58573
12/19/2023, 10:23 PMcreamy-pencil-82913
12/19/2023, 10:24 PMcreamy-pencil-82913
12/19/2023, 10:25 PMabundant-hair-58573
12/19/2023, 10:28 PMmachineGlobalConfig:
cni: canal
disable-kube-proxy: false
etcd-expose-metrics: false
kube-apiserver-arg:
- cloud-provider=external
kube-controller-manager-arg:
- cloud-provider=external
kube-proxy-arg:
- '--hostname-override="$(hostname -f)"'
kube-scheduler-arg: []
machinePools: null
machineSelectorConfig:
- config:
cloud-provider-name: aws
kubelet-arg:
- '--hostname-override="$(hostname -f)"'
abundant-hair-58573
12/19/2023, 10:29 PMcreamy-pencil-82913
12/19/2023, 10:32 PM- '--hostname-override="$(hostname -f)"'
does that… work? I wouldn’t have expected shell expressions to be supported like that.creamy-pencil-82913
12/19/2023, 10:32 PMcreamy-pencil-82913
12/19/2023, 10:40 PMcreamy-pencil-82913
12/19/2023, 10:41 PMabundant-hair-58573
12/19/2023, 10:41 PMerror syncing '<ip_address>our.domain': failed to get provider ID for node <ip_address>.our.domain at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing
creamy-pencil-82913
12/19/2023, 10:41 PMspec:
containers:
- args:
- --cluster-cidr=10.42.0.0/16
- --conntrack-max-per-core=0
- --conntrack-tcp-timeout-close-wait=0s
- --conntrack-tcp-timeout-established=0s
- --healthz-bind-address=127.0.0.1
- --hostname-override="$(hostname -f)"
- --kubeconfig=/var/lib/rancher/rke2/agent/kubeproxy.kubeconfig
- --proxy-mode=iptables
command:
- kube-proxy
I get this:
root@rke2-server-1:/# cat /var/log/pods/kube-system_kube-proxy-rke2-server-1_e778576655760f9a1185c3599f8f57f0/kube-proxy/0.log
2023-12-19T22:38:09.257758437Z stderr F E1219 22:38:09.257613 1 server.go:1039] "Failed to retrieve node info" err="nodes \"\\\"$(hostname -f)\\\"\" not found"
abundant-hair-58573
12/19/2023, 10:41 PMcreamy-pencil-82913
12/19/2023, 10:43 PMcreamy-pencil-82913
12/19/2023, 10:43 PMcreamy-pencil-82913
12/19/2023, 10:43 PMWhen IP-based naming is used, the nodes must be named after the instance followed by the regional domain name (). If you have custom domain name set in the DHCP options, you must setip-xxx-xxx-xxx-xxx.ec2.<region>.internal
on kube-proxy and kubelet to match the above-mentioned naming convention.--hostname-override
When resource based naming is used, the node must be named after the instance either with or without a domain name (ori-1234567890abcdefg
). A custom domain name, configured through DHCP options, may also be used.i-1234567890abcdefg.<region>.compute.internal
creamy-pencil-82913
12/19/2023, 10:44 PMhostname
abundant-hair-58573
12/19/2023, 10:44 PMabundant-hair-58573
12/19/2023, 10:46 PMcreamy-pencil-82913
12/19/2023, 10:46 PMcreamy-pencil-82913
12/19/2023, 10:47 PMip-xxx-xxx-xxx-xxx.ec2.<region>.internal
or i-1234567890abcdefg.<region>.compute.internal
naming schemeabundant-hair-58573
12/19/2023, 10:52 PMcreamy-pencil-82913
12/19/2023, 10:58 PM#!/bin/sh
TOKEN=`curl -s -X PUT "<http://169.254.169.254/latest/api/token>" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
REGION=`curl -s -H "X-aws-ec2-metadata-token: $TOKEN" <http://169.254.169.254/latest/meta-data/placement/region>`
INSTANCE=`curl -s -H "X-aws-ec2-metadata-token: $TOKEN" <http://169.254.169.254/latest/meta-data/instance-id>`
echo "overriding hostname as ${INSTANCE}.${REGION}.compute.internal"
mkdir -p /etc/rancher/rke2/config.yaml.d
echo <<EOF >/etc/rancher/rke2/config.yaml.d/99-aws-id.yaml
kubelet-arg+:
- --hostname-override=${INSTANCE}.${REGION}.compute.internal
kube-proxy-arg+:
- --hostname-override=${INSTANCE}.${REGION}.compute.internal
EOF
abundant-hair-58573
12/19/2023, 11:00 PMcreamy-pencil-82913
12/19/2023, 11:01 PMabundant-hair-58573
12/19/2023, 11:02 PMcreamy-pencil-82913
12/19/2023, 11:14 PMcat <<EOF
not echo <<EOF
of course but otherewise should work I thinkabundant-hair-58573
12/20/2023, 12:27 AMkubelet-arg+:
- --hostname-override=<ip_address>.ec2.us-east-1.internal
kube-proxy-arg+:
- --hostname-override=<ip_address>.ec2.us-east-1.internal
Now it hasn't even gotten to starting the cloud controller, the rke2.server.service log shows this error
"Waiting for control-plane node <ip_address>.our.domain startup: nodes \"<ip_address>.our.domain\" not found"
So somewhere it's still trying to use the actual hostname of the node, with our custom domainabundant-hair-58573
12/20/2023, 12:29 AMcreamy-pencil-82913
12/20/2023, 12:36 AMcreamy-pencil-82913
12/20/2023, 12:37 AMabundant-hair-58573
12/20/2023, 12:41 AMps -ef |grep kubelet
does show the kubelet runningabundant-hair-58573
12/20/2023, 12:43 AMps
output of the kubelet process so that part definitely workedcreamy-pencil-82913
12/20/2023, 12:48 AMcreamy-pencil-82913
12/20/2023, 12:48 AMabundant-hair-58573
12/20/2023, 12:55 AM"Attempting to register node" node="<IP>.ec2.us-east-1.internal"
"Unable to register node with API server" err="nodes \"<IP>.ec2.us-east-1.internal\" is forbidden: node \"<IP>.our.domain\" is not allowed to modify node \"<IP>.ec2.us-east-1.internal"
abundant-hair-58573
12/20/2023, 12:56 AMabundant-hair-58573
12/20/2023, 1:00 AMcreamy-pencil-82913
12/20/2023, 1:00 AMnode-name: <whatever>
creamy-pencil-82913
12/20/2023, 1:01 AMabundant-hair-58573
12/20/2023, 1:01 AMcreamy-pencil-82913
12/20/2023, 1:02 AMabundant-hair-58573
12/20/2023, 1:02 AMabundant-hair-58573
12/20/2023, 3:34 AM