This message was deleted.
# rke2
a
This message was deleted.
m
I have observer rke2-metric server is set with
hostnetwork as  true
by default. I tried making it as false. Though
rke2-metric-server
came up it ended up in different issue.
None of pods logs are viewable or cannot execute shell etc.. everywhere its showing following or similar error message.
Copy code
Error from server: Get "https://<ip addr1>:10250/containerLogs/anamespace/lb987c5bc4-4vlzp/api?follow=true": x509: certificate is valid for 127.0.0.1, not <ip addr1>
would like to know how to overcome this issue.
Also I have observed following behavior, looking forward guidance to resolve
Copy code
>>> k get nodes
E0615 14:24:39.179242   83498 memcache.go:287] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.188277   83498 memcache.go:287] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.190885   83498 memcache.go:121] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.193944   83498 memcache.go:121] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.195687   83498 memcache.go:121] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
NAME                 STATUS   ROLES                       AGE   VERSION
server-1          Ready    control-plane,etcd,master   35d   v1.24.12+rke2r1
c
What else is using that port on your nodes? All the messages you're getting are due to metrics server not being able to run.
m
On the node while checking netstat… it says kube-api-server is on 10250 port
c
did you customize the metrics server deployment or something?
m
No… I didn’t… I did upgraded environment from v1.22.x to v1.23.x to current environment
c
was there a problem with the upgrade? can you confirm that the metrics-server deployment was successfully upgraded to the latest release?
m
yes… I came across https://github.com/rancher/rke2-charts/blob/main/charts/rke2-metrics-server/rke2-metrics-server/2.11.100-build2023051508/values.yaml#L21-L27 and made host network as false and ending up with the certificate error :(
c
It’s false by default, you shouldn’t have to change it
🆗 1
what do you get from
kubectl get helmchart -n kube-system rke2-metrics-server -o yaml |grep chart-url
and
kubectl get pod -n kube-system -l <http://helmcharts.helm.cattle.io/chart=rke2-metrics-server|helmcharts.helm.cattle.io/chart=rke2-metrics-server>
Did you by some chance deploy a different metrics-server to the cluster?
m
Copy code
<http://helm.cattle.io/chart-url|helm.cattle.io/chart-url>: <https://rke2-charts.rancher.io/assets/rke2-metrics-server/rke2-metrics-server-2.11.100-build2022101107.tgz>
c
that’s the current one, and it has it set to false. What about the other command (get pods) and also
kubectl get helmchartconfig -A
m
Copy code
kubectl get pod -n kube-system -l <http://helmcharts.helm.cattle.io/chart=rke2-metrics-server|helmcharts.helm.cattle.io/chart=rke2-metrics-server>

this returns empty
Copy code
>> kubectl get pod -n kube-system | grep metric
rke2-metrics-server-8787c959c-z862k                     0/1     CrashLoopBackOff   196 (2m23s ago)   16h
c
no pods for the helm job? What about
kubectl get job -n kube-system -l <http://helmcharts.helm.cattle.io/chart=rke2-metrics-server|helmcharts.helm.cattle.io/chart=rke2-metrics-server>
m
Copy code
NAME                               COMPLETIONS   DURATION   AGE
helm-install-rke2-metrics-server   1/1           15s        57d
c
how long ago did you upgrade the cluster? That indicates that the chart deployment was last updated 57 days ago.
can you also get the helmchartconfig list?
m
yeah i did this upgrade quite a while
Copy code
helmVersion: 3
  info:
    description: Upgrade complete
    firstDeployed: '2022-09-23T23:32:37Z'
    lastDeployed: '2023-05-17T03:30:10Z'
c
hmm that lines up
still not seeing
kubectl get helmchartconfig -A
output to confirm that you didn’t set it to host network true at some point
m
it returns
Copy code
No resources found
c
ok. So I don’t see why the pod would be running with host network
what do you get from
kubectl get deployment -n kube-system rke2-metrics-server -o yaml
?
m
figured out the issue with
Copy code
>>> k get nodes
E0615 14:24:39.179242   83498 memcache.go:287] couldn't get resource list for <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.188277   83498 memcache.go:287] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.190885   83498 memcache.go:121] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.193944   83498 memcache.go:121] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
E0615 14:24:39.195687   83498 memcache.go:121] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: the server is currently unable to handle the request
I did upgrade RKE2 from 1.22 to 1.24. post that tried to upgrade prometheus stack too, however got failed and some how 3 selectors got inserted in few of the prometheus services which prevented service to attach with appropriate pods. upon fixing service selector got fixed the issue.
391 Views