Hi here Hope you re doing well I have a quick question What Rancher Users #harvester

Hi @here, Hope you’re doing well! I have a quick ...

future-gigabyte-33261

09/21/2025, 9:46 PM

Hi @here, Hope you’re doing well! I have a quick question: • What images are you using to provision RKE2 clusters in Harvester? • How long does it typically take for you to provision a 3-node cluster? For reference, I’m currently testing with: • openSUSE Leap 15.6 • openSUSE Leap 16 • openSUSE Leap Micro 6 With these images, provisioning takes: • ~1 hour when using Calico • ~30–40 minutes when using Flannel

thousands-advantage-10804

09/22/2025, 12:04 PM

Hi.. • I prefer Rocky • 5-10 minutes. I would check to see how long it takes for a VM to spin up. versus the rke2 scripts.

❤️ 1

worried-state-78253

09/23/2025, 2:06 PM

That sounds a bit long, were using OpenSuse leap micro 6.1, the last provision log shows about 30 mins per node - this is on NVME/10GiB network, Network using Calico. I'll give flannel a go next time we bring one up... We did use Rocky first time but found that slower and less flexible, as were using an NFS storage the Suse image came preconfigured with a lot of what we needed out the box and seemed to provision in half the time. We have used and still use Rocky for VM's, since this is a Kubernetes cluster the OS has no bearing on the workloads you run within it, you want a light weight OS with minimal features and the basics for workloads. Leap Micro's storage requirements are 1/5 the size of a Rocky install and that should result in faster node creation. What type of storage are you running? And how have you setup the network?

future-gigabyte-33261

09/23/2025, 2:10 PM

Im using harvester longhorn with HDDs unfortunately

future-gigabyte-33261

09/23/2025, 2:11 PM

I added 1 ssd per server and installed harvester then i added HDDs and configured to be used by longhorn

future-gigabyte-33261

09/23/2025, 2:12 PM

So all vms are running in HDD but i dont think thats the issue

future-gigabyte-33261

09/23/2025, 2:13 PM

Since the vms gets created in 2 min and +-2 minutes to finish the cloudinit

future-gigabyte-33261

09/23/2025, 2:13 PM

I dont think thats an issue

worried-state-78253

09/23/2025, 2:15 PM

Ahh - thats your slowdown right there, remember your disk images have to copy over, then installs etc download packages - HDD is really very slow and if your doing it right (with replicas over 3 nodes min) thats a lot of data to shunt about to get a node up... Grab yourself some cheep 1TiB SSD's... Oh - if you have SSD's then just setup another storage type - you'll need to download the image to the same format - then you can seed them on SSD - it will make a big difference. We're using NVME - there is quite a bit of IO going on while these are setup and RKE2 needs high IOPS to work properly. Thats my experience anyway.

worried-state-78253

09/23/2025, 2:15 PM

Would love to be wrong as we're 20TiB of HDD in each machine....

worried-state-78253

09/23/2025, 2:16 PM

Can you see your SSD's listed in the longhorn volumes?

future-gigabyte-33261

09/23/2025, 2:17 PM

future-gigabyte-33261

09/23/2025, 2:17 PM

Unfortunately

worried-state-78253

09/23/2025, 2:17 PM

if so - you could try tagging them so you can setup a storage class for them...

future-gigabyte-33261

09/23/2025, 2:17 PM

They are only in os i chose them as os and hdd as data volume when i set them up

future-gigabyte-33261

09/23/2025, 2:18 PM

Im not really directly replicating

future-gigabyte-33261

09/23/2025, 2:18 PM

But the strange part is that with ubuntu

future-gigabyte-33261

09/23/2025, 2:18 PM

The cluster comes up in 15-20 minutes

worried-state-78253

09/23/2025, 2:18 PM

you sure they are not listed - were they small SSD's - we used 1TiB disks and we ended up with about 600GiB that we could provision to.

future-gigabyte-33261

09/23/2025, 2:19 PM

And in 40 minutes a 4 node cluster 3 controleplane 1 worker

future-gigabyte-33261

09/23/2025, 2:20 PM

They are 800 gb hp sas SSD

worried-state-78253

09/23/2025, 2:20 PM

Something sounds strange there - is it the image your using, maybe use a newer version of Leap - we used micro 6.1. Ubuntu is a good fallback - but again like rocky its quite a lot of OS for what is just orchestration.

worried-state-78253

09/23/2025, 2:20 PM

Mind you it would be using a local image wouldnt it...

worried-state-78253

09/23/2025, 2:21 PM

I wonder if its updating itself during isntall

worried-state-78253

09/23/2025, 2:21 PM

since thats an old OS - and in SUSE it would need to create another snapshot and restart to apply updates.

worried-state-78253

09/23/2025, 2:23 PM

Anyways - let me know if you figure it out - thought I'd share my experience as our OS choice is very similar, just one minor version newer.

❤️ 1

future-gigabyte-33261

09/23/2025, 2:26 PM

Thanks a lot

future-gigabyte-33261

09/23/2025, 2:26 PM

I will let you know if i figure it out

future-gigabyte-33261

09/23/2025, 2:26 PM

I have also other issues

future-gigabyte-33261

09/23/2025, 2:27 PM

For example for some reason clusters go offline every day for 1-3 hours

future-gigabyte-33261

09/23/2025, 2:27 PM

Kubelet not responding

future-gigabyte-33261

09/23/2025, 2:27 PM

And i think is related to CCM

future-gigabyte-33261

09/23/2025, 2:28 PM

But its making me doubt the choice of harveter

worried-state-78253

09/23/2025, 2:28 PM

We did struggle early on, I think you need a good network - we use 2x10GiB SPF+ just for management/storage and have configured the storage with its own dedicated network on a VLAN - until we did that we did get lots of issues with stability.

future-gigabyte-33261

09/23/2025, 2:29 PM

That’s my point too

worried-state-78253

09/23/2025, 2:30 PM

I'm also close to getting rid of it - but we do have some VM requirements still... if we didn't I'd wipe the harvester cluster and go bare metal with ceph - but harvester is good at running mixed environments when you get the setup worked out, i.e. if you want to run VM's and RKE2 clusters at the same time.

future-gigabyte-33261

09/23/2025, 2:31 PM

Ye thats my requirement too

worried-state-78253

09/23/2025, 2:32 PM

That is the one thing thats very good - particularly if your trying to get into k8s and want to experiment - it does make it easier but there is an overhead and you will need to make sure network and storage is up to the job.

prehistoric-morning-49258

09/23/2025, 7:31 PM

^ this is why I ❤️ single node harvester. imo makes a lot of sense for good % of use cases

future-gigabyte-33261

09/23/2025, 7:33 PM

And then single point of failure

future-gigabyte-33261

09/23/2025, 7:33 PM

For example (im running my startup in there)

future-gigabyte-33261

09/23/2025, 7:33 PM

1 small issue and everything gone forever

prehistoric-morning-49258

09/23/2025, 7:34 PM

yeah but the point is the cluster is already effectively a single point of failure. so if the cluster fails you should be able to migrate your traffic to a new cluster quick and have one ready

prehistoric-morning-49258

09/23/2025, 7:34 PM

it won't be gone forever if you do backups/etc properly

future-gigabyte-33261

09/23/2025, 7:34 PM

Ye ye i get u

future-gigabyte-33261

09/23/2025, 7:36 PM

Actually when harvester came out i thought it would be somthing like openstack with proper distribution like (hypothetically: 1 node storage 1 cpu 1 memory)

future-gigabyte-33261

09/23/2025, 7:38 PM

It would make much more sense to have 10Gbps required

prehistoric-morning-49258

09/23/2025, 7:42 PM

working backwards from first principles 10GBps shouldn't be at all required, but wanting snowflake VMs on distributed block storage kinda do be that way. I just think its inefficient and causes more problems than it generally solves, but "it's HA"

worried-state-78253

09/23/2025, 7:42 PM

If your running 3 replicas and spread your cluster over enough nodes it does mitigate failure nicely. Network needs to be up to the job tho. I’ve an aggregation switch sfp+ that would do a small cluster uk based tho - you can pickup 10g sfp cards quite cheep but they run hot so get a fan pointing at them in the chassis

prehistoric-morning-49258

09/23/2025, 7:44 PM

yeah but ime actual hardware failures are quite rare, and if you can restore from live backup in <15mins once every x years you're fine

prehistoric-morning-49258

09/23/2025, 7:45 PM

you can in practice even just have a "hot standby" ready to flip to

prehistoric-morning-49258

09/23/2025, 7:48 PM

but e.g. our main use case would be postgres DBs, and it makes more sense to just rely on postgres replication for that stuff. its 1000x more efficient and built for purpose. thats truly of a lot of the underlying stuff you'd be putting on longhorn on a VM block storage to replicate and fill the pipes

worried-state-78253

09/23/2025, 7:50 PM

Well I’ve been doing this for 25 years now and seen a few servers go wrong in my time - most commonly HDDs and SSDs giving up and having some automation or fallback built in if you can afford it does give piece of mind. Also just being able to pull a node out without killing things to change things around - add components etc is great when it works.

prehistoric-morning-49258

09/23/2025, 7:53 PM

yeah, like we effectively do realtime backup of the data that matters. replicate-the-vm is .... a way to accomplish that, but it imposes a lot of costs and imo should be more of a last resort for legacy projects w/ real snowflake VM

worried-state-78253

09/23/2025, 8:14 PM

Well - kinda the same thing if your volume is relocated already, however we’re almost completely in containerised workloads now - as volumes are smaller fast replication is much more efficient. Anyway putting that to one side was sorely tempted to try out an arm server - you can get 98core nodes now - however back to single point! However in this instance it’s 20 mins to provision replacement - depending on your use case …. We still want HA however some dns magic and as second site …

prehistoric-morning-49258

09/23/2025, 8:20 PM

if the volumes are small enough 1Gbps should be enough 🫣

prehistoric-morning-49258

09/23/2025, 8:21 PM

like "is data I need to have replicated/backed up being written faster than 1GBps" ... if it was you'd probably already have better NICs

future-gigabyte-33261

09/23/2025, 8:32 PM

I think the 10Gbps should be only at scale when you have 5+ nodes and you really need to have etcd synced

future-gigabyte-33261

09/23/2025, 8:33 PM

So you can have full replication

future-gigabyte-33261

09/23/2025, 8:35 PM

But considering that everything in harvester is an overlay on top of K8s, kinda makes sense to require some speed but 10Gbps and all SSD’s for homelabs or startups its too much

future-gigabyte-33261

09/23/2025, 8:37 PM

They push on using ARM architecture (all fine with it) but they must push to this optimisations also IMO

future-gigabyte-33261

09/23/2025, 8:39 PM

Make it possible for the broke ones 😜

prehistoric-morning-49258

09/23/2025, 9:07 PM

I don't 100% trust my ability to recover longhorn volumes anyway. especially if the goal is < 15 mins downtime vs. begging for help on slack 🤪

worried-state-78253

09/24/2025, 2:23 PM

Thats another reason why when I'm done with VM's were moving to ceph/baremetal k8s managed by rancher.... but until then harvester is a handy tool.... having said that "boxes" was pretty cool - and just reserving a machine for that gives us the VM capacity we need.... we've cluster has 5 in total and we've almost enough kit for another machine... Also - for those just trying it out and experimenting with k8s rancher-desktop is also very good to test k8s on a local machine - you don't actually need a physical cluster, single node is another way to go if you want to dedicate a box and just try things out without having to worry about networking.

prehistoric-morning-49258

09/24/2025, 3:11 PM

I like VMs as an encapsulation/isolation/security mechanism, I just think they should be deployed more like containers. i.e. you wouldn't do replicated block storage for a container's ephemeral disk

worried-state-78253

09/24/2025, 3:17 PM

Technically you can do that already - but it would prevent live migration - simply have the images with single replicas and mount additional disks with replicas for your storage. I did similar to this early on and realised it was a bit of a mistake come upgrade day - pretty sure single replicas also mean that they have to be cloned to another node while a node is being upgraded, though you might be able to update the policy there - you can certainly tell it a VM must be shutdown during upgrade. Containerised workloads do replicate the disk image layers too, but its handled more efficiently and they are usually much smaller (though I've seen some beastly containers....)

prehistoric-morning-49258

09/24/2025, 3:18 PM

right, the harvester live migrate won't work, but ideally you can just spin up identical VM and attach migrated storage w/ a few lines of code. ymmv

worried-state-78253

09/24/2025, 3:26 PM

For us would be having quite a few VM's running, remembering which need special treatment, depends I guess how your managing it. Lets face it upgrades feel a bit of a gamble, but the fewer edge cases you have the easier it is to keep in step with upstream.

worried-state-78253

09/24/2025, 3:27 PM

Reminds me of the old days and vagrant...

prehistoric-morning-49258

09/24/2025, 3:29 PM

yeah the easiest time to prevent edge cases is when you're starting. I get the migrating legacy workload issue for sure

Open in Slack

Previous Next