This message was deleted.
# k3s
a
This message was deleted.
q
One of the worst offenders is this little beauty:
Copy code
# Query 1: 0.04 QPS, 11.47x concurrency, ID 0xD27FF547D625F3FDFECFFC0A53C5044E at byte 165076927
# This item is included in the report because it matches --limit.
# Scores: V/M = 2972.43
# Time range: 2023-06-24 15:03:19 to 2023-06-28 21:58:31
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         12   16335
# Exec time     64 4249389s      5s   6520s    260s   1571s    879s     15s
# Lock time      0     23s   146us   335ms     1ms     4ms     5ms   626us
# Rows sent      0  15.94k       0       1    1.00    0.99    0.02    0.99
# Rows examine  33 219.90M       0 132.21k  13.78k 101.89k  34.33k   24.84
# Rows affecte   0       0       0       0       0       0       0       0
# Bytes sent     0   2.35M       0     155  150.82  151.03   24.45  151.03
# Query size     8  10.59M     670     712  679.64  685.39   13.38  652.75
# String:
# Databases    k3s
# Hosts        192.168.68.130
# Users        k3s
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s  ##############################
#  10s+  ################################################################
# Tables
#    SHOW TABLE STATUS FROM `k3s` LIKE 'kine'\G
#    SHOW CREATE TABLE `k3s`.`kine`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT (
		SELECT MAX(rkv.id) AS id
		FROM kine AS rkv), COUNT(c.theid)
			FROM (

		SELECT *
		FROM (
			SELECT (
		SELECT MAX(rkv.id) AS id
		FROM kine AS rkv), (
		SELECT MAX(crkv.prev_revision) AS prev_revision
		FROM kine AS crkv
		WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value
			FROM kine AS kv
			JOIN (
				SELECT MAX(mkv.id) AS id
				FROM kine AS mkv
				WHERE
					mkv.name LIKE '/registry/events/%'

				GROUP BY mkv.name) AS maxkv
				ON maxkv.id = kv.id
			WHERE
				kv.deleted = 0 OR
				0
		) AS lkv
		ORDER BY lkv.theid ASC

			) c\G
c
Do you have something that's flooding your cluster with events?
You could also check the server logs for compact messages to confirm that it is compacting old rows successfully
q
Nothing spamming events that I know of ... Things went bad after I tried installing the grafana agent operator. I uninstalled it shortly after, but I guess something could be left behind that spams events, helm isn't always good at cleaning up ... any tips on tracking down something like that?
kubectl get events -w
and following along?
If it isn't compacting old rows, how can I help it along?
c
well, check the logs to see what it says. It should tell you how many rows it compacted and how long it took. If it says it’s failing, then you can try to figure out why.
Usual cause of failing compaction is due to load - it intentionally compacts with a short deadline to avoid locking the DB for too long, but if the datastore is loaded then it will exceed the timeout without getting anything done. Worst case scenario it may be necessary to scale down some workloads to reduce datastore load and allow it to catch up with compaction and get the database size back down.
q
I think I'm going to see if I can track down those excess events, but trying to get those from the api server isn't flying. The datastore is so overworked that the server can't keep the leader election and restarts every few minutes, causing any api operations to fail. I guess I can try digging directly in the database to see if I can find some clues as to where this is coming from. Thanks for the pointers, going to dig around more tomorrow.
w
@quick-dentist-45681 Very helpful to track what is (over)using apiserver is audit log. Best if you ship it to your logging tooling (i.e. ELK) or simply work on the log file. Then you should be able to get easily what you need.
s
I'm facing the same error in one of my clusters and that makes the load high on the servers to make them unresponsive. creamy-pencil-82913 Kindly provide me with some ideas or share the solution if any.