This message was deleted.
# longhorn-storage
a
This message was deleted.
b
The script is attempting to find out the host OS of the nodes, what OS are you running the Kubernetes on?
n
The OS on all nodes is Ubuntu 22.04. The result of
grep -E "^ID_LIKE=" /etc/os-release | cut -d= -f2
on all nodes is
debian
.
b
Oh wait, I've missed the first line
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not <http://xxx.xxx.xxx.xxx|xxx.xxx.xxx.xxx>
. Looks like the check script failed executing kubectl. Are you running locally on one of the K8s nodes?
n
I am running the script from my local machine
K3s 1.25.6 with rancher 2.7.2
b
The certificate appears to be valid to the Kubernetes API server. Could tried to run the script on the control plan node?
n
will try, give me a moment
ok, I ran the script on one of the three control plane nodes and this is the result
Copy code
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not <http://xxx.xxx.xxx.xxx|xxx.xxx.xxx.xxx>
[ERROR] Unable to detect kernel release on node geo-node1.
geo-node1
is a worker (4th node)
but xx.xxx.xxx.xxx is not the IP of the worker, it is the IP of another control plane
b
You need to resolve the certificate issue first. Suggest to checking with Rancher folks.
n
Could you help me figure out which command in the script is causing the certificate error? I assume it is not the same that you posted earlier. thanks
b
Are you able to run
kubectl get ns
?
n
Copy code
NAME                                     STATUS   AGE
cattle-fleet-clusters-system             Active   39d
cattle-fleet-local-system                Active   39d
cattle-fleet-system                      Active   39d
cattle-global-data                       Active   39d
cattle-global-nt                         Active   39d
cattle-impersonation-system              Active   39d
cattle-monitoring-system                 Active   28d
cattle-system                            Active   44h
cluster-fleet-local-local-1a3d67d0a899   Active   39d
default                                  Active   39d
fleet-default                            Active   39d
fleet-local                              Active   39d
gi                                       Active   39d
kube-node-lease                          Active   39d
kube-public                              Active   39d
kube-system                              Active   39d
local                                    Active   39d
longhorn-system                          Active   13d
node-feature-discovery                   Active   27d
p-d92bp                                  Active   13d
p-h6p5f                                  Active   13d
p-lwjbx                                  Active   13d
p-xgtff                                  Active   13d
p-xsq75                                  Active   13d
user-6v7ft                               Active   13d
whatwhatwhy                              Active   33d
b
How about
kubectl exec -it <any-pod> -- bash
n
works for a test pod in whatwhatwhy
kubectl exec -it -n whatwhatwhy box1-798d6dd5d6-mjftc -- bash
b
hmm
n
i can a also do
kubectl exec -it overlaytest-4vfff -- bash
b
Try to create this DaemonSet.
Then execute
kubectl exec -i <daemonset-pod> -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'grep -E "^ID_LIKE=" /etc/os-release | cut -d= -f2'
Or
kubectl exec -i <daemonset-pod> -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'uname -r'
n
Copy code
$ kubectl get pods -o wide | grep longhorn-environment-check
longhorn-environment-check-6d64z   1/1     Running                 0                7m7s   10.42.2.157      gi-rm1      <none>           <none>
longhorn-environment-check-cwk62   1/1     Running                 0                7m7s   10.42.1.252      geo-node1   <none>           <none>
longhorn-environment-check-rjkhf   1/1     Running                 0                7m7s   10.42.0.159      gi-rm0      <none>           <none>
longhorn-environment-check-xmtqs   1/1     Running                 0                7m7s   10.42.3.21       gi-rm2      <none>           <none>
Copy code
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'grep -E "^ID_LIKE=" /etc/os-release | cut -d= -f2'
debian
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'uname -r'
5.15.0-69-generic
$ DS=longhorn-environment-check-cwk62
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'grep -E "^ID_LIKE=" /etc/os-release | cut -d= -f2'
debian
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'uname -r'
5.15.0-69-generic
$ DS=longhorn-environment-check-rjkhf
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'grep -E "^ID_LIKE=" /etc/os-release | cut -d= -f2'
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not 82.165.18.79
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'uname -r'
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not 82.165.18.79
$ DS=longhorn-environment-check-xmtqs
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'grep -E "^ID_LIKE=" /etc/os-release | cut -d= -f2'
debian
$ kubectl exec -i ${DS} -- nsenter --mount=/proc/1/ns/mnt -- bash -c 'uname -r'
5.15.0-69-generic
b
It seems that only one of the nodes is having the certificate issue. This needs to resolve first since the check script runs those commands on all DaemonSet pods.
n
I replaced the node (it was just a small cloud server) and now the issue has moved to another node
geo-node1
🤔 1
b
Maybe this is relevant.
n
I destroyed the cluster and recreated everything. script comes back without any complaint. thanks!
🙌 1
322 Views