This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/16/2023, 11:44 AM

This message was deleted.

breezy-autumn-81048

03/16/2023, 11:45 AM

I was trying to verify the installation using the official cert-manager guide: https://cert-manager.io/v1.2-docs/installation/kubernetes/#verifying-the-installation. The result:

Copy code

kubectl .kube\config.yaml apply -f test-resources.yaml
namespace/cert-manager-test unchanged
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded

So, it's not an issue with actions-runner-controller, but with the cert-manager or K3S deployed using Rancher. Could someone recommend how to fix this issue?

breezy-autumn-81048

03/16/2023, 11:45 AM

cert-manager-cainjector pod logs:

Copy code

E0315 19:51:36.923104 1 cluster.go:161] cert-manager "msg"="Failed to get API Group-Resources" "error"="Get \"<https://10.43.0.1:443/api?timeout=32s>\": dial tcp 10.43.0.1:443: i/o timeout"
Error: error creating manager: Get "<https://10.43.0.1:443/api?timeout=32s>": dial tcp 10.43.0.1:443: i/o timeout
Usage:
ca-injector [flags]
Flags:
--add_dir_header If true, adds the file directory to the header of the log messages
--alsologtostderr log to standard error as well as files (no effect when -logtostderr=true)
--enable-profiling Enable profiling for cainjector
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
-h, --help help for ca-injector
--kubeconfig string Paths to a kubeconfig. Only required if out-of-cluster.
--leader-elect If true, cainjector will perform leader election between instances to ensure no more than one instance of cainjector operates at a time (default true)
--leader-election-lease-duration duration The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 1m0s)
--leader-election-namespace string Namespace used to perform leader election. Only used if leader election is enabled (default "kube-system")
--leader-election-renew-deadline duration The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. (default 40s)
--leader-election-retry-period duration The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 15s)
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory (no effect when -logtostderr=true)
--log_file string If non-empty, use this log file (no effect when -logtostderr=true)
--log_file_max_size uint Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--namespace string If set, this limits the scope of cainjector to a single namespace. If set, cainjector will not update resources with certificates outside of the configured namespace.
--one_output If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
--profiler-address string Address of the Go profiler (pprof) if enabled. This should never be exposed on a public interface. (default "localhost:6060")
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files (no effect when -logtostderr=true)
--stderrthreshold severity logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false) (default 2)
-v, --v Level number for the log level verbosity (default 0)
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
error creating manager: Get "<https://10.43.0.1:443/api?timeout=32s>": dial tcp 10.43.0.1:443: i/o timeout

rough-farmer-49135

03/16/2023, 1:04 PM

The only time I had problems with cert-manager (with RKE2) was when I'd forgotten and left firewalld up and running with a 3 master setup. Made it randomly work ~33% of the time since it was randomized what master it went to and worked fine if it queried the master the pod was running on but otherwise failed.

breezy-autumn-81048

03/16/2023, 1:38 PM

Hi @rough-farmer-49135, thank you for your reply! Firewalld is disabled: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) Tried to debug using this guide: https://cert-manager.io/docs/troubleshooting/webhook/#error-context-deadline-exceeded Here is the result:

Copy code

* Added cert-manager-webhook.cert-manager.svc:10250:127.0.0.1 to DNS cache
*   Trying 10.154.146.178:443...
* Connected to <http://my.host.net|my.host.net> (10.154.146.178) port 443 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server accepted http/1.1
* Server auth using Basic with user 'kubeconfig-user-9qc1111111'
> POST /k8s/clusters/c-m-qjr7kbg9/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy/validate HTTP/1.1
> Host: <http://my.host.net|my.host.net>
> Authorization: Basic gaewfawefwefwefdsfeturyurrtrte4534th4572r23fwef
> User-Agent: curl/7.83.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 517
>* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< Audit-Id: c5b1d7a6-c4e5-444f-a067-372e1e06f524
< Cache-Control: no-cache, no-store, must-revalidate
< Cache-Control: no-cache, private
< Content-Length: 276
< Content-Type: application/json
< Date: Thu, 16 Mar 2023 13:04:03 GMT
< X-Api-Cattle-Auth: true
< X-Content-Type-Options: nosniff
<{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "error trying to reach service: proxy error from 127.0.0.1:6443 while dialing 10.42.7.12:53, code 503: 503 Service Unavailable",
  "reason": "ServiceUnavailable",
  "code": 503
}*

10.42.7.12 - codedns pod So, it means that something is wrong with network?

rough-farmer-49135

03/16/2023, 1:43 PM

Maybe the internal network, maybe the coredns pod. I don't recall specifically, but I think I got a 404 instead of a 503 when I had my firewall problems. So you might have problems with your network plugin & traffic routing instead of a firewall issue? You could check the coredns logs and you could also do things like start shells on some of your pods running on different nodes and see if any can talk to it or all are blocked with just nslookup, host, or ping commands. I'd check pods on all your nodes, and if I recall correctly I think

kubectl get pods -A -o wide

will tell you what node a given pod is running on.

👍 1

breezy-autumn-81048

03/16/2023, 1:43 PM

Here are some logs from the coredns:

Copy code

I0316 13:37:46.929570 1 trace.go:205] Trace[942292090]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:37:16.928) (total time: 30000ms):
Trace[942292090]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:37:46.929)
Trace[942292090]: [30.000559454s] [30.000559454s] END
E0316 13:37:46.929584 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:38:21.128007 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:38:21.128083 1 trace.go:205] Trace[1677112065]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:37:51.127) (total time: 30000ms):
Trace[1677112065]: ---"Objects listed" error:Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:38:21.127)
Trace[1677112065]: [30.000941767s] [30.000941767s] END
E0316 13:38:21.128102 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
W0316 13:38:30.196262 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:38:30.196330 1 trace.go:205] Trace[518327474]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:38:00.195) (total time: 30000ms):
Trace[518327474]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:38:30.196)
Trace[518327474]: [30.000532191s] [30.000532191s] END
E0316 13:38:30.196344 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:39:03.327155 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:39:03.327238 1 trace.go:205] Trace[472849857]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:38:33.326) (total time: 30000ms):
Trace[472849857]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:39:03.327)
Trace[472849857]: [30.000579133s] [30.000579133s] END
E0316 13:39:03.327258 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:39:25.231933 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:39:25.232044 1 trace.go:205] Trace[677905320]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:38:55.230) (total time: 30001ms):
Trace[677905320]: ---"Objects listed" error:Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:39:25.231)
Trace[677905320]: [30.001014748s] [30.001014748s] END
E0316 13:39:25.232059 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:39:59.853741 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:39:59.853855 1 trace.go:205] Trace[1629891480]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:39:29.852) (total time: 30000ms):
Trace[1629891480]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:39:59.853)
Trace[1629891480]: [30.000976129s] [30.000976129s] END
E0316 13:39:59.853872 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
W0316 13:40:17.731262 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:40:17.731324 1 trace.go:205] Trace[1242097369]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:39:47.730) (total time: 30001ms):
Trace[1242097369]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout 30001ms (13:40:17.731)
Trace[1242097369]: [30.001195763s] [30.001195763s] END
E0316 13:40:17.731341 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "<https://10.43.0.1:443/api/v1/namespaces?resourceVersion=6721552>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:40:34.564570 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:40:34.564659 1 trace.go:205] Trace[1746491914]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:40:04.563) (total time: 30001ms):
Trace[1746491914]: ---"Objects listed" error:Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout 30001ms (13:40:34.564)
Trace[1746491914]: [30.001397329s] [30.001397329s] END
E0316 13:40:34.564671 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "<https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=6723908>": dial tcp 10.43.0.1:443: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
W0316 13:41:04.424294 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout
I0316 13:41:04.424391 1 trace.go:205] Trace[953581850]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167 (16-Mar-2023 13:40:34.423) (total time: 30000ms):
Trace[953581850]: ---"Objects listed" error:Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout 30000ms (13:41:04.424)
Trace[953581850]: [30.000987325s] [30.000987325s] END
E0316 13:41:04.424405 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get "<https://10.43.0.1:443/api/v1/services?resourceVersion=6723076>": dial tcp 10.43.0.1:443: i/o timeout

rough-farmer-49135

03/16/2023, 1:49 PM

I don't have time to try to read through long logs, but at a glance I notice that your coredns logs are having trouble getting to 10.43.0.0/16 IPs, which default to your services, & your other log was trying to get to 10.42.x.y which is a pod IP. I'd expect your pods trying to communicate to coredns to use the service IP instead of the pod IP so that's weird, also it's weird that coredns is getting a timeout trying to talk to I presume the kube-apiserver. So you've definitely got something hosed, and coredns not able to talk to kube-apiserver quite possibly means it's not finishing startup and listening for connections. Only idea I have for that is if you've been running it for a year your certs might've expired and something even farther up the chain is timing out and your apiserver is non-responsive from that.

breezy-autumn-81048

03/16/2023, 1:52 PM

Thank you for your input, I really appreciate that! It's a fresh K3S that was deployed about a month ago.

rough-farmer-49135

03/16/2023, 1:53 PM

As a note, if you're trying to get your kube-apiserver logs you'll probably have to use crictl and check the static pods, though you may also see the info with something like

journalctl -u k3s-server

(I presume, I've done more RKE2, so I might be off there).

❤️ 1

rough-farmer-49135

03/16/2023, 1:53 PM

If it's fresh K3S, then shouldn't be your certs, so I'd probably look further up the chain and see if some of your core components in static pods are failing.

rough-farmer-49135

03/16/2023, 1:54 PM

Don't think these are exactly the same with K3S, but the RKE2 notes for using crictl can be found at https://gist.github.com/superseb/3b78f47989e0dbc1295486c186e944bf and might give you what hints you need to poke around.

breezy-autumn-81048

03/16/2023, 1:56 PM

Thank you, I will definitely check.

647 Views

Open in Slack

Previous Next