This message was deleted.
# rke2
a
This message was deleted.
a
I've done this before (accidentally!) And it worked ok. On the node it created SSH to it and grab the rancher-system-agent logs.
c
Hm. Yeah it looks to be like an image issue. I have one from 3/27 that works fine and the same one but updated on 5/04 is not working. I just noticed my /etc/passwd looks different.. old
Copy code
docker:x:1028:100::/home/docker:/bin/sh
new
Copy code
ubuntu:x:1028:1029:Ubuntu:/home/ubuntu:/bin/bash
how strange.. I believe rancher uses the docker user to ssh in, which would explain the problem, but I have no idea why I’m getting an ubuntu user now.
a
Probably the default user in the cloud image. If the docker user doesn't exist it hasn't ingested the new cloud config
Has the nodes hostname changed?
c
nope, never gets to that point
a
Yeah if the node name hasn't changed then it hasn't ingested the rancher generated cloud init user data config
It actually does that before invoking any rancher specific activity
c
Does rancher ssh into the node to start the process?
Or sent cloud-init data on provisioning?
a
Not with rke2. The cloud init user data config writes and executes a script that fires off the agent install. The cloud init config does include the new hostname value though
But the ssh key is used if you SSH into the node using the rancher ui
c
Hm. So I wonder if something is off with the new version of cloud-init
a
How did you make your new template?
c
jenkins job runs a packer job. Packer creates it identically as the last image, the only difference is time and the OS updates.
a
Did you run cloud init clean as part of the build?
c
Yup
a
Which OS is it?
c
ubuntu 20.04
I had to also remove contents of a dir to fully reset it
c
Oh there is one difference. Our images were trying to connect to the metadata URL and take several minutes to boot up, so we added this file
Copy code
root@ubuntu-server:/etc/cloud/cloud.cfg.d# cat 99_disable_metadata.cfg
datasource_list: [ None ]
a
Ah. That'll instruct cloud init not to ingest from the iso rancher mounts
c
dangit!
Thank you for the extra pair of 👀
a
You need to have the nocloud data source added iirc
👍 1
No worries!
c
I’ll give that a shot, thank you!!
One more question if you have a second! If we are running longhorn with local PVs, does the auto-upgrade mess things up with volume replicas? It’s unfortunate we can’t choose which node to do in order and put the longhorn node in maintenance first.
a
What do you mean longhorn with local pvs?
c
with longhorn storing the volume replicas on the worker nodes. Then updating the template will replace the worker nodes with new ones.
a
Ah. Providing you have enough replicas and they're spread across worker nodes. Sure. You can influence how many worker nodes are upgraded in parallel in the cluster options
c
Typically you would go to the longhorn UI and put the node in maintenance and offload the volume. But with an update to the cluster, updates are automatic and it replaces worker nodes quickly. You don’t get a chance to go to the longhorn UI
a
Interesting. Can you post that in #longhorn-storage please?
c
Oh, didn’t know about the channel! Thank you!