This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

01/31/2024, 4:56 PM

This message was deleted.

creamy-pencil-82913

01/31/2024, 4:57 PM

sounds like the sqlite file got corrupted perhaps? Did you run out of disk space?

creamy-pencil-82913

01/31/2024, 4:57 PM

mostly just sounds like you’re missing some rows from the database

broad-battery-84396

01/31/2024, 4:58 PM

I don't think so, but I might look around

creamy-pencil-82913

01/31/2024, 4:58 PM

are you using sqlite or an external db?

broad-battery-84396

01/31/2024, 4:58 PM

sqlite

creamy-pencil-82913

01/31/2024, 4:58 PM

yeah… were there any interesting messages at the start of the log?

creamy-pencil-82913

01/31/2024, 4:59 PM

both of those indicate that critical keys are missing from the datastore.

creamy-pencil-82913

01/31/2024, 4:59 PM

which probably means the sqilte file was corrupted when the node crashed

broad-battery-84396

01/31/2024, 4:59 PM

Copy code

server1:/var/lib/rancher/k3s # /usr/local/bin/k3s server --flannel-backend none --disable-network-policy --token K106e0d23ed626d7fa3d9113a8f50d806f584c63e3a52325122bfc6f2d2d173abb4::server:7187c735c7335ea1ac123899986909eb 
INFO[0000] Starting k3s v1.27.9+k3s1 (2c249a39)         
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s 
INFO[0000] Configuring database table schema and indexes, this may take a moment... 
INFO[0000] Database tables and indexes are up to date   
INFO[0000] Kine available at <unix://kine.sock>           
ERRO[0000] TTL event watch failed to get start revision 
INFO[0000] Bootstrap key already locked - waiting for data to be populated by another server 
ERRO[0000] TTL event watch failed to get start revision

broad-battery-84396

01/31/2024, 5:01 PM

I was able to make a full backup copy of the /data directory to the same disk when I started looking into it, so I don't think I am out of space

broad-battery-84396

01/31/2024, 5:01 PM

But corrupted sounds plausible

broad-battery-84396

01/31/2024, 5:05 PM

Any k3s specific advice, or should I just look at trying to fix the sqlite database?

broad-battery-84396

01/31/2024, 5:09 PM

https://www.sqlite.org/recovery.html

creamy-pencil-82913

01/31/2024, 5:24 PM

yeah you might grab the sqlite cli tools and poke at it :/

creamy-pencil-82913

01/31/2024, 5:24 PM

in general it is pretty robust, I’ve only seen it get trashed a few times and usually there is some underlying storage issue, like bad sd cards on a Raspberry Pi or something.

broad-battery-84396

01/31/2024, 6:02 PM

I ran the

.recovery

command and dumped that back, now I'm getting a different error.

Copy code

FATA[0000] starting kubernetes: preparing server: creating storage endpoint: starting kine backend: rpc error: code = InvalidArgument desc = etcdserver: duplicate key given in txn request

creamy-pencil-82913

01/31/2024, 6:11 PM

sounds like you now have some duplicate rows. You might check for entries with the same

id, name

name, prev_revision

broad-battery-84396

01/31/2024, 6:12 PM

What are the wal and shm db files for? Should I be looking at them?

creamy-pencil-82913

01/31/2024, 6:16 PM

sqlite internal stuff

broad-battery-84396

01/31/2024, 6:16 PM

got it

broad-battery-84396

01/31/2024, 7:22 PM

Thanks for the help, I think I'm just going to have to roll back to my backups from last month

101 Views

Open in Slack

Previous Next