https://rancher.com/ logo
Title
b

breezy-autumn-81048

03/16/2023, 11:44 AM
Hi community, I have deployed a K3S cluster using Rancher and on top of it have installed cert-manager v1.11.0. All pods are running, however, the cert-manager-webhook pod is logging some errors:
Trace[1068908304]: [30.003276269s] [30.003276269s] END
E0314 15:02:02.236947       1 reflector.go:140] <http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169>: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout
W0314 15:03:28.953687       1 reflector.go:424] <http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169>: failed to list *v1.Secret: Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout
I0314 15:03:28.953816       1 trace.go:219] Trace[516939538]: "Reflector ListAndWatch" name:<http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169> (14-Mar-2023 15:02:58.949) (total time: 30004ms):
Trace[516939538]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout 30004ms (15:03:28.953)
Trace[516939538]: [30.004226263s] [30.004226263s] END
E0314 15:03:28.953837       1 reflector.go:140] <http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169>: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout
W0314 15:04:44.919380       1 reflector.go:424] <http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169>: failed to list *v1.Secret: Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout
I0314 15:04:44.919458       1 trace.go:219] Trace[430405071]: "Reflector ListAndWatch" name:<http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169> (14-Mar-2023 15:04:14.918) (total time: 30000ms):
Trace[430405071]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (15:04:44.919)
Trace[430405071]: [30.000964846s] [30.000964846s] END
E0314 15:04:44.919472       1 reflector.go:140] <http://k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169|k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169>: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "<https://10.43.0.1:443/api/v1/namespaces/cert-manager/secrets?fieldSelector=metadata.name%3Dcert-manager-webhook-ca&resourceVersion=360915>": dial tcp 10.43.0.1:443: i/o timeout
Can someone explain what's wrong? It feels that I can't fully install a helm chart because of this issue. ( I noticed this issue when was trying to install a helm chart of actions-runner-controller, and the error I got:
Error: Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded )
As well, here are some logs from the pod of actions-runner-controller:
Warning  FailedMount  17m                  kubelet            Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[kube-api-access-v48zj secret tmp cert]: timed out waiting for the condition
  Warning  FailedMount  8m32s                kubelet            Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[tmp cert kube-api-access-v48zj secret]: timed out waiting for the condition
  Warning  FailedMount  6m18s (x5 over 19m)  kubelet            Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[secret tmp cert kube-api-access-v48zj]: timed out waiting for the condition
  Warning  FailedMount  103s (x2 over 4m1s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[cert kube-api-access-v48zj secret tmp]: timed out waiting for the condition
  Warning  FailedMount  86s (x18 over 21m)   kubelet            MountVolume.SetUp failed for volume "cert" : secret "actions-runner-controller-serving-cert" not found
Thanks in advance,
I was trying to verify the installation using the official cert-manager guide: https://cert-manager.io/v1.2-docs/installation/kubernetes/#verifying-the-installation. The result:
kubectl .kube\config.yaml apply -f test-resources.yaml
namespace/cert-manager-test unchanged
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
So, it's not an issue with actions-runner-controller, but with the cert-manager or K3S deployed using Rancher. Could someone recommend how to fix this issue?
cert-manager-cainjector pod logs:
E0315 19:51:36.923104 1 cluster.go:161] cert-manager "msg"="Failed to get API Group-Resources" "error"="Get \"<https://10.43.0.1:443/api?timeout=32s>\": dial tcp 10.43.0.1:443: i/o timeout"
Error: error creating manager: Get "<https://10.43.0.1:443/api?timeout=32s>": dial tcp 10.43.0.1:443: i/o timeout
Usage:
ca-injector [flags]
Flags:
--add_dir_header If true, adds the file directory to the header of the log messages
--alsologtostderr log to standard error as well as files (no effect when -logtostderr=true)
--enable-profiling Enable profiling for cainjector
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
-h, --help help for ca-injector
--kubeconfig string Paths to a kubeconfig. Only required if out-of-cluster.
--leader-elect If true, cainjector will perform leader election between instances to ensure no more than one instance of cainjector operates at a time (default true)
--leader-election-lease-duration duration The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 1m0s)
--leader-election-namespace string Namespace used to perform leader election. Only used if leader election is enabled (default "kube-system")
--leader-election-renew-deadline duration The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. (default 40s)
--leader-election-retry-period duration The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 15s)
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory (no effect when -logtostderr=true)
--log_file string If non-empty, use this log file (no effect when -logtostderr=true)
--log_file_max_size uint Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--namespace string If set, this limits the scope of cainjector to a single namespace. If set, cainjector will not update resources with certificates outside of the configured namespace.
--one_output If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
--profiler-address string Address of the Go profiler (pprof) if enabled. This should never be exposed on a public interface. (default "localhost:6060")
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files (no effect when -logtostderr=true)
--stderrthreshold severity logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false) (default 2)
-v, --v Level number for the log level verbosity (default 0)
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
error creating manager: Get "<https://10.43.0.1:443/api?timeout=32s>": dial tcp 10.43.0.1:443: i/o timeout
r

rough-farmer-49135

03/16/2023, 1:04 PM
The only time I had problems with cert-manager (with RKE2) was when I'd forgotten and left firewalld up and running with a 3 master setup. Made it randomly work ~33% of the time since it was randomized what master it went to and worked fine if it queried the master the pod was running on but otherwise failed.
b

breezy-autumn-81048

03/16/2023, 1:38 PM
Hi @rough-farmer-49135, thank you for your reply! Firewalld is disabled: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) Tried to debug using this guide: https://cert-manager.io/docs/troubleshooting/webhook/#error-context-deadline-exceeded Here is the result:
* Added cert-manager-webhook.cert-manager.svc:10250:127.0.0.1 to DNS cache
*   Trying 10.154.146.178:443...
* Connected to <http://my.host.net|my.host.net> (10.154.146.178) port 443 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server accepted http/1.1
* Server auth using Basic with user 'kubeconfig-user-9qc1111111'
> POST /k8s/clusters/c-m-qjr7kbg9/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy/validate HTTP/1.1
> Host: <http://my.host.net|my.host.net>
> Authorization: Basic gaewfawefwefwefdsfeturyurrtrte4534th4572r23fwef
> User-Agent: curl/7.83.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 517
>* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< Audit-Id: c5b1d7a6-c4e5-444f-a067-372e1e06f524
< Cache-Control: no-cache, no-store, must-revalidate
< Cache-Control: no-cache, private
< Content-Length: 276
< Content-Type: application/json
< Date: Thu, 16 Mar 2023 13:04:03 GMT
< X-Api-Cattle-Auth: true
< X-Content-Type-Options: nosniff
<{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "error trying to reach service: proxy error from 127.0.0.1:6443 while dialing 10.42.7.12:53, code 503: 503 Service Unavailable",
  "reason": "ServiceUnavailable",
  "code": 503
}*
10.42.7.12 - codedns pod So, it means that something is wrong with network?
r

rough-farmer-49135

03/16/2023, 1:43 PM
Maybe the internal network, maybe the coredns pod. I don't recall specifically, but I think I got a 404 instead of a 503 when I had my firewall problems. So you might have problems with your network plugin & traffic routing instead of a firewall issue? You could check the coredns logs and you could also do things like start shells on some of your pods running on different nodes and see if any can talk to it or all are blocked with just nslookup, host, or ping commands. I'd check pods on all your nodes, and if I recall correctly I think
kubectl get pods -A -o wide
will tell you what node a given pod is running on.
👍 1
b

breezy-autumn-81048

03/16/2023, 1:43 PM
Here are some logs from the coredns:
I0316 13:37:46.929570 1 trace.go:205] Trace[942292090]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:37:16.928) (total time: 30000ms):
Trace[942292090]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:37:46.929)
Trace[942292090]: [30.000559454s] [30.000559454s] END
E0316 13:37:46.929584 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:38:21.128007 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:38:21.128083 1 trace.go:205] Trace[1677112065]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:37:51.127) (total time: 30000ms):
Trace[1677112065]: ---"Objects listed" error:Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:38:21.127)
Trace[1677112065]: [30.000941767s] [30.000941767s] END
E0316 13:38:21.128102 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
W0316 13:38:30.196262 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:38:30.196330 1 trace.go:205] Trace[518327474]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:38:00.195) (total time: 30000ms):
Trace[518327474]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:38:30.196)
Trace[518327474]: [30.000532191s] [30.000532191s] END
E0316 13:38:30.196344 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:39:03.327155 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:39:03.327238 1 trace.go:205] Trace[472849857]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:38:33.326) (total time: 30000ms):
Trace[472849857]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:39:03.327)
Trace[472849857]: [30.000579133s] [30.000579133s] END
E0316 13:39:03.327258 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:39:25.231933 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:39:25.232044 1 trace.go:205] Trace[677905320]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:38:55.230) (total time: 30001ms):
Trace[677905320]: ---"Objects listed" error:Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:39:25.231)
Trace[677905320]: [30.001014748s] [30.001014748s] END
E0316 13:39:25.232059 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:39:59.853741 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:39:59.853855 1 trace.go:205] Trace[1629891480]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:39:29.852) (total time: 30000ms):
Trace[1629891480]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:39:59.853)
Trace[1629891480]: [30.000976129s] [30.000976129s] END
E0316 13:39:59.853872 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
W0316 13:40:17.731262 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:40:17.731324 1 trace.go:205] Trace[1242097369]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:39:47.730) (total time: 30001ms):
Trace[1242097369]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout 30001ms (13:40:17.731)
Trace[1242097369]: [30.001195763s] [30.001195763s] END
E0316 13:40:17.731341 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:40:34.564570 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:40:34.564659 1 trace.go:205] Trace[1746491914]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:40:04.563) (total time: 30001ms):
Trace[1746491914]: ---"Objects listed" error:Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout 30001ms (13:40:34.564)
Trace[1746491914]: [30.001397329s] [30.001397329s] END
E0316 13:40:34.564671 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:41:04.424294 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:41:04.424391 1 trace.go:205] Trace[953581850]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:40:34.423) (total time: 30000ms):
Trace[953581850]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:41:04.424)
Trace[953581850]: [30.000987325s] [30.000987325s] END
E0316 13:41:04.424405 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
r

rough-farmer-49135

03/16/2023, 1:49 PM
I don't have time to try to read through long logs, but at a glance I notice that your coredns logs are having trouble getting to 10.43.0.0/16 IPs, which default to your services, & your other log was trying to get to 10.42.x.y which is a pod IP. I'd expect your pods trying to communicate to coredns to use the service IP instead of the pod IP so that's weird, also it's weird that coredns is getting a timeout trying to talk to I presume the kube-apiserver. So you've definitely got something hosed, and coredns not able to talk to kube-apiserver quite possibly means it's not finishing startup and listening for connections. Only idea I have for that is if you've been running it for a year your certs might've expired and something even farther up the chain is timing out and your apiserver is non-responsive from that.
b

breezy-autumn-81048

03/16/2023, 1:52 PM
Thank you for your input, I really appreciate that! It's a fresh K3S that was deployed about a month ago.
r

rough-farmer-49135

03/16/2023, 1:53 PM
As a note, if you're trying to get your kube-apiserver logs you'll probably have to use crictl and check the static pods, though you may also see the info with something like
journalctl -u k3s-server
(I presume, I've done more RKE2, so I might be off there).
❤️ 1
If it's fresh K3S, then shouldn't be your certs, so I'd probably look further up the chain and see if some of your core components in static pods are failing.
Don't think these are exactly the same with K3S, but the RKE2 notes for using crictl can be found at https://gist.github.com/superseb/3b78f47989e0dbc1295486c186e944bf and might give you what hints you need to poke around.
b

breezy-autumn-81048

03/16/2023, 1:56 PM
Thank you, I will definitely check.