This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

05/10/2023, 1:12 AM

This message was deleted.

agreeable-oil-87482

05/10/2023, 6:24 AM

I've done this before (accidentally!) And it worked ok. On the node it created SSH to it and grab the rancher-system-agent logs.

creamy-autumn-87994

05/10/2023, 5:02 PM

Hm. Yeah it looks to be like an image issue. I have one from 3/27 that works fine and the same one but updated on 5/04 is not working. I just noticed my /etc/passwd looks different.. old

Copy code

docker:x:1028:100::/home/docker:/bin/sh

new

Copy code

ubuntu:x:1028:1029:Ubuntu:/home/ubuntu:/bin/bash

how strange.. I believe rancher uses the docker user to ssh in, which would explain the problem, but I have no idea why I’m getting an ubuntu user now.

agreeable-oil-87482

05/10/2023, 5:03 PM

Probably the default user in the cloud image. If the docker user doesn't exist it hasn't ingested the new cloud config

agreeable-oil-87482

05/10/2023, 5:04 PM

Has the nodes hostname changed?

creamy-autumn-87994

05/10/2023, 5:04 PM

nope, never gets to that point

agreeable-oil-87482

05/10/2023, 5:04 PM

Yeah if the node name hasn't changed then it hasn't ingested the rancher generated cloud init user data config

agreeable-oil-87482

05/10/2023, 5:05 PM

It actually does that before invoking any rancher specific activity

creamy-autumn-87994

05/10/2023, 5:06 PM

Does rancher ssh into the node to start the process?

creamy-autumn-87994

05/10/2023, 5:07 PM

Or sent cloud-init data on provisioning?

agreeable-oil-87482

05/10/2023, 5:07 PM

Not with rke2. The cloud init user data config writes and executes a script that fires off the agent install. The cloud init config does include the new hostname value though

agreeable-oil-87482

05/10/2023, 5:07 PM

But the ssh key is used if you SSH into the node using the rancher ui

creamy-autumn-87994

05/10/2023, 5:08 PM

Hm. So I wonder if something is off with the new version of cloud-init

agreeable-oil-87482

05/10/2023, 5:08 PM

How did you make your new template?

creamy-autumn-87994

05/10/2023, 5:09 PM

jenkins job runs a packer job. Packer creates it identically as the last image, the only difference is time and the OS updates.

agreeable-oil-87482

05/10/2023, 5:09 PM

Did you run cloud init clean as part of the build?

creamy-autumn-87994

05/10/2023, 5:09 PM

Yup

agreeable-oil-87482

05/10/2023, 5:10 PM

Which OS is it?

creamy-autumn-87994

05/10/2023, 5:10 PM

ubuntu 20.04

agreeable-oil-87482

05/10/2023, 5:11 PM

https://github.com/David-VTUK/Rancher-Packer/blob/d2822931a12b1967f4c3362072c5f386fb4eb57f/vSphere/ubuntu_2004/script.sh#L22

agreeable-oil-87482

05/10/2023, 5:11 PM

I had to also remove contents of a dir to fully reset it

creamy-autumn-87994

05/10/2023, 5:12 PM

Oh there is one difference. Our images were trying to connect to the metadata URL and take several minutes to boot up, so we added this file

Copy code

root@ubuntu-server:/etc/cloud/cloud.cfg.d# cat 99_disable_metadata.cfg
datasource_list: [ None ]

agreeable-oil-87482

05/10/2023, 5:13 PM

Ah. That'll instruct cloud init not to ingest from the iso rancher mounts

creamy-autumn-87994

05/10/2023, 5:13 PM

dangit!

creamy-autumn-87994

05/10/2023, 5:13 PM

Thank you for the extra pair of 👀

agreeable-oil-87482

05/10/2023, 5:13 PM

You need to have the nocloud data source added iirc

👍 1

agreeable-oil-87482

05/10/2023, 5:14 PM

No worries!

creamy-autumn-87994

05/10/2023, 5:14 PM

I’ll give that a shot, thank you!!

creamy-autumn-87994

05/10/2023, 5:18 PM

One more question if you have a second! If we are running longhorn with local PVs, does the auto-upgrade mess things up with volume replicas? It’s unfortunate we can’t choose which node to do in order and put the longhorn node in maintenance first.

agreeable-oil-87482

05/10/2023, 5:20 PM

What do you mean longhorn with local pvs?

creamy-autumn-87994

05/10/2023, 5:20 PM

with longhorn storing the volume replicas on the worker nodes. Then updating the template will replace the worker nodes with new ones.

agreeable-oil-87482

05/10/2023, 5:21 PM

Ah. Providing you have enough replicas and they're spread across worker nodes. Sure. You can influence how many worker nodes are upgraded in parallel in the cluster options

agreeable-oil-87482

05/10/2023, 5:22 PM

Id also check out https://longhorn.io/docs/1.4.1/high-availability/auto-balance-replicas/

creamy-autumn-87994

05/10/2023, 5:22 PM

Typically you would go to the longhorn UI and put the node in maintenance and offload the volume. But with an update to the cluster, updates are automatic and it replaces worker nodes quickly. You don’t get a chance to go to the longhorn UI

agreeable-oil-87482

05/10/2023, 5:24 PM

Interesting. Can you post that in #longhorn-storage please?

creamy-autumn-87994

05/10/2023, 5:24 PM

Oh, didn’t know about the channel! Thank you!

14 Views

Open in Slack

Previous Next