bright-lifeguard-9803
05/05/2025, 3:49 PMcreamy-pencil-82913
05/05/2025, 9:41 PMbright-lifeguard-9803
05/06/2025, 4:37 PMsqlite3
CLI.
1. VACUUM;
2. .backup
a. We are going to replace .backup
with VACUUM INTO "$file"
which uses much less resources in our testing. We don't need a block-for-block copy of the DB that .backup
gives. Just a logical copy is fine.
etcd
is a non-starter for us for a number of reasons I won't labor on about here.creamy-pencil-82913
05/06/2025, 5:11 PMthe task to compact every 5 mins, exits after ~1.5 hours and is never run again until the next RKE2 restartThat would be a problem. What version of RKE2 are you using? Can you provide logs from the last few compacts up until it stops running?
creamy-pencil-82913
05/06/2025, 5:13 PM?_journal=WAL&cache=shared&_busy_timeout=30000&_txlock=immediate
creamy-pencil-82913
05/06/2025, 5:14 PMsqlite://
datastore URL then this is probably self-inflicted.bright-lifeguard-9803
05/06/2025, 6:38 PMv1.29.15+rke2r1
is the running RKE2 version.bright-lifeguard-9803
05/06/2025, 6:52 PMtime="2025-05-05T18:48:42Z" level=info msg="Starting rke2 v1.29.15+rke2r1 (c228232895f83b6c3e17a9741e0c8d8e4439894d)"
...
time="2025-05-05T18:53:56Z" level=info msg="COMPACT compactRev=75025 targetCompactRev=76025 currentRev=1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT deleted 1126 rows from 1000 revisions in 196.169102ms - compacted to 76025/1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT compactRev=76025 targetCompactRev=77025 currentRev=1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT deleted 842 rows from 1000 revisions in 336.495043ms - compacted to 77025/1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT compactRev=77025 targetCompactRev=78025 currentRev=1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT deleted 940 rows from 1000 revisions in 53.591919ms - compacted to 78025/1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT compactRev=78025 targetCompactRev=79025 currentRev=1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT deleted 1000 rows from 1000 revisions in 18.16894ms - compacted to 79025/1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT compactRev=79025 targetCompactRev=80025 currentRev=1161724"
time="2025-05-05T18:53:57Z" level=info msg="COMPACT deleted 1218 rows from 1000 revisions in 81.003631ms - compacted to 80025/1161724"
...
2025-05-05 18:53:57.685
time="2025-05-05T18:53:57Z" level=info msg="COMPACT compactRev=80025 targetCompactRev=81025 currentRev=1161724"
time="2025-05-05T21:03:56Z" level=info msg="COMPACT compactRev=1165631 targetCompactRev=1165856 currentRev=1166856"
time="2025-05-05T21:03:57Z" level=info msg="COMPACT deleted 225 rows from 225 revisions in 58.338043ms - compacted to 1165856/1166856"
time="2025-05-05T21:03:57Z" level=info msg="COMPACT compacted from 1165631 to 1165856 in 1 transactions over 66ms"
time="2025-05-05T21:08:56Z" level=info msg="COMPACT compactRev=1165856 targetCompactRev=1166045 currentRev=1167045"
time="2025-05-05T21:08:57Z" level=info msg="COMPACT deleted 188 rows from 189 revisions in 65.004803ms - compacted to 1166045/1167045"
time="2025-05-05T21:08:57Z" level=info msg="COMPACT compacted from 1165856 to 1166045 in 1 transactions over 74ms"
time="2025-05-05T21:13:56Z" level=info msg="COMPACT compactRev=1166045 targetCompactRev=1166235 currentRev=1167235"
2025-05-05 21:13:57.069
time="2025-05-05T21:13:57Z" level=info msg="COMPACT deleted 184 rows from 190 revisions in 82.111953ms - compacted to 1166235/1167235"
time="2025-05-05T21:13:57Z" level=info msg="COMPACT compacted from 1166045 to 1166235 in 1 transactions over 94ms"
2025-05-05 21:18:56.998
time="2025-05-05T21:18:56Z" level=info msg="COMPACT compactRev=1166235 targetCompactRev=1166424 currentRev=1167424"
time="2025-05-05T21:18:57Z" level=info msg="COMPACT deleted 186 rows from 189 revisions in 63.675493ms - compacted to 1166424/1167424"
time="2025-05-05T21:18:57Z" level=info msg="COMPACT compacted from 1166235 to 1166424 in 1 transactions over 70ms"
time="2025-05-05T21:23:56Z" level=info msg="COMPACT compactRev=1166424 targetCompactRev=1166631 currentRev=1167631"
time="2025-05-05T21:23:57Z" level=info msg="COMPACT deleted 207 rows from 207 revisions in 63.629354ms - compacted to 1166631/1167631"
time="2025-05-05T21:23:57Z" level=info msg="COMPACT compacted from 1166424 to 1166631 in 1 transactions over 71ms"
...
time="2025-05-05T21:30:25Z" level=info msg="Slow SQL (started: 2025-05-05 21:28:02.020865697 +0000 GMT m=+9560.047370128) (total time: 9.015145073s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:30:25Z" level=info msg="Slow SQL (started: 2025-05-05 21:28:52.023796848 +0000 GMT m=+9610.050301290) (total time: 1m33.113218343s): SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? AND mkv.name > ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.thename ASC LIMIT 1"
time="2025-05-05T21:32:22Z" level=info msg="Slow SQL (started: 2025-05-05 21:32:20.076529512 +0000 GMT m=+9818.103033953) (total time: 1.847318586s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:32:23Z" level=info msg="Slow SQL (started: 2025-05-05 21:32:21.897961966 +0000 GMT m=+9819.924466417) (total time: 1.371286536s): SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? AND mkv.name > ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.thename ASC LIMIT 1"
time="2025-05-05T21:32:32Z" level=info msg="Slow SQL (started: 2025-05-05 21:32:27.708314229 +0000 GMT m=+9825.734818661) (total time: 3.522299989s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:32:38Z" level=info msg="Slow SQL (started: 2025-05-05 21:32:36.612321933 +0000 GMT m=+9834.638826374) (total time: 1.632291266s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:32:55Z" level=info msg="Slow SQL (started: 2025-05-05 21:32:43.081714625 +0000 GMT m=+9841.108219056) (total time: 10.558056181s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:33:15Z" level=info msg="Slow SQL (started: 2025-05-05 21:32:58.264770761 +0000 GMT m=+9856.291275273) (total time: 16.887739505s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:34:39Z" level=info msg="Slow SQL (started: 2025-05-05 21:34:31.837860217 +0000 GMT m=+9949.864364678) (total time: 1.893476126s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:36:09Z" level=info msg="Slow SQL (started: 2025-05-05 21:36:03.119823112 +0000 GMT m=+10041.146327553) (total time: 5.709208552s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:36:18Z" level=info msg="Slow SQL (started: 2025-05-05 21:36:17.214519515 +0000 GMT m=+10055.241023946) (total time: 1.108935611s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500"
time="2025-05-05T21:36:22Z" level=info msg="Slow SQL (started: 2025-05-05 21:36:19.17271006 +0000 GMT m=+10057.199214501) (total time: 1.319724577s): SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? AND mkv.name > ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.thename ASC ) c"
bright-lifeguard-9803
05/06/2025, 6:57 PMbright-lifeguard-9803
05/06/2025, 6:58 PMcreamy-pencil-82913
05/06/2025, 7:17 PMbright-lifeguard-9803
05/06/2025, 7:33 PMdisable-etcd: true
in our RKE2 config. I am poking around in k3s code to see what that does for the DSN.creamy-pencil-82913
05/06/2025, 7:35 PMbright-lifeguard-9803
05/06/2025, 7:36 PMcreamy-pencil-82913
05/06/2025, 7:36 PMbright-lifeguard-9803
05/06/2025, 7:36 PMcreamy-pencil-82913
05/06/2025, 7:37 PMcreamy-pencil-82913
05/06/2025, 7:38 PMcreamy-pencil-82913
05/06/2025, 7:38 PMbright-lifeguard-9803
05/06/2025, 7:41 PMv1.32.3+rke2r1
I'll check to see if I can reproduce the issue there. v1.29.15+rke2r1
is what's shipping now. Will have to look into upgrading that.bright-lifeguard-9803
05/06/2025, 7:47 PMcreamy-pencil-82913
05/06/2025, 7:48 PMcreamy-pencil-82913
05/06/2025, 7:48 PMbright-lifeguard-9803
05/06/2025, 7:49 PMbright-lifeguard-9803
05/07/2025, 2:54 PM1.29.15
system.
time="2025-04-29T11:30:54-04:00" level=info msg="COMPACT compactRev=5409199 targetCompactRev=5409409 currentRev=5410409"
time="2025-04-29T11:36:38-04:00" level=error msg="Compact failed: failed to begin transaction: context deadline exceeded"
bright-lifeguard-9803
05/07/2025, 3:04 PMtime="2025-05-05T21:30:25Z" level=error msg="Compact failed: failed to begin transaction: context deadline exceeded
(I was looking for compact
and COMPACT
not Compact
)
The prior failure to this one was somewhat different:
time="2025-04-16T21:39:44Z" level=error msg="Compact failed: failed to get current revision: sql: transaction has already been committed or rolled back
bright-lifeguard-9803
05/07/2025, 3:09 PMerr
?bright-lifeguard-9803
05/07/2025, 3:12 PMbright-lifeguard-9803
05/07/2025, 4:03 PMCompact failed
messages in the last 30 days.creamy-pencil-82913
05/07/2025, 4:35 PMbright-lifeguard-9803
05/07/2025, 5:39 PMbright-lifeguard-9803
05/07/2025, 7:03 PMv1.32.1+rke2r1
Compact failed: failed to record compact revision: sql: transaction has already been committed or rolled back
2
v1.32.3+rke2r1
Compact failed: failed to record compact revision: sql: transaction has already been committed or rolled back
1
v1.32.3+rke2r1
Compact failed: failed to get current revision: sql: Rows are closed
2
v1.32.3+rke2r1
Compact failed: failed to record compact revision: sql: transaction has already been committed or rolled back
1
v1.32.3+rke2r1
Compact failed: failed to compact to revision: sql: transaction has already been committed or rolled back
1
v1.32.3+rke2r1
Compact failed: failed to record compact revision: sql: transaction has already been committed or rolled back
3
creamy-pencil-82913
05/07/2025, 7:06 PMcreamy-pencil-82913
05/07/2025, 7:06 PMbright-lifeguard-9803
05/07/2025, 7:36 PMv1.32.3+rke2r1
, when this is logged:
time="2025-04-16T00:31:45Z" level=error msg="Compact failed: failed to record compact revision: sql: transaction has already been committed or rolled back
COMPACT
is absent from 04-16T00:31:45Z
until the next RKE2 start on the 25th:bright-lifeguard-9803
05/07/2025, 7:37 PMbright-lifeguard-9803
05/09/2025, 5:06 PMbright-lifeguard-9803
05/09/2025, 5:09 PMetcd_db_total_size_in_bytes
or apiserver_storage_db_total_size_in_bytes
in my API Server's /metrics
endpoint.creamy-pencil-82913
05/09/2025, 5:15 PMcreamy-pencil-82913
05/09/2025, 5:16 PMsupervisor-metrics: true
in config.yaml end then check /metrics on the supervisor port (9435)bright-lifeguard-9803
05/09/2025, 5:16 PMcreamy-pencil-82913
05/09/2025, 5:19 PMbright-lifeguard-9803
05/09/2025, 5:22 PMbright-lifeguard-9803
05/09/2025, 5:26 PM$ curl -k -sS <https://localhost:9345/metrics>
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}
creamy-pencil-82913
05/09/2025, 5:28 PMkubectl get --server <https://localhost:9345> --raw /metrics
bright-lifeguard-9803
05/09/2025, 5:29 PMcreamy-pencil-82913
05/09/2025, 5:31 PM2025-05-09T17:26:15.031291661Z stderr F I0509 17:26:15.031202 1 etcd3.go:492] Failed to get storage db size for ep unixs:///var/lib/rancher/rke2/server/kine.sock: rpc error: code = Unknown desc = no such table: dbstat
creamy-pencil-82913
05/09/2025, 5:31 PMbright-lifeguard-9803
05/09/2025, 5:32 PMcreamy-pencil-82913
05/09/2025, 5:32 PMcreamy-pencil-82913
05/09/2025, 5:33 PMbright-lifeguard-9803
05/09/2025, 5:47 PMkubectl get --server <https://localhost:9345> --raw /metrics
and I don't see anything related to Kine or db size or compactions or ...creamy-pencil-82913
05/09/2025, 5:54 PMbright-lifeguard-9803
05/09/2025, 5:55 PMbright-lifeguard-9803
05/09/2025, 5:56 PM$ sudo kubectl get --raw /metrics |grep apiserver_storage_objects |sort -k2,2nr |head
apiserver_storage_objects{resource="events"} 211
apiserver_storage_objects{resource="configmaps"} 105
apiserver_storage_objects{resource="serviceaccounts"} 87
apiserver_storage_objects{resource="replicasets.apps"} 83
apiserver_storage_objects{resource="<http://clusterroles.rbac.authorization.k8s.io|clusterroles.rbac.authorization.k8s.io>"} 80
apiserver_storage_objects{resource="<http://clusterrolebindings.rbac.authorization.k8s.io|clusterrolebindings.rbac.authorization.k8s.io>"} 70
apiserver_storage_objects{resource="pods"} 59
apiserver_storage_objects{resource="<http://endpointslices.discovery.k8s.io|endpointslices.discovery.k8s.io>"} 53
apiserver_storage_objects{resource="services"} 53
apiserver_storage_objects{resource="endpoints"} 52
creamy-pencil-82913
05/09/2025, 5:57 PMcreamy-pencil-82913
05/09/2025, 5:57 PMbright-lifeguard-9803
05/09/2025, 5:58 PMbright-lifeguard-9803
05/09/2025, 5:58 PMbright-lifeguard-9803
05/09/2025, 5:59 PMcreamy-pencil-82913
05/09/2025, 5:59 PMbright-lifeguard-9803
05/09/2025, 6:00 PMbright-lifeguard-9803
05/09/2025, 7:22 PMcreamy-pencil-82913
05/09/2025, 7:29 PMcreamy-pencil-82913
05/09/2025, 7:29 PMcreamy-pencil-82913
05/09/2025, 7:29 PMcreamy-pencil-82913
05/09/2025, 7:30 PMbright-lifeguard-9803
05/09/2025, 8:28 PM