Hi @here, Hope you’re doing well! I have a quick ...
# harvester
f
Hi @here, Hope you’re doing well! I have a quick question: • What images are you using to provision RKE2 clusters in Harvester? • How long does it typically take for you to provision a 3-node cluster? For reference, I’m currently testing with: • openSUSE Leap 15.6 • openSUSE Leap 16 • openSUSE Leap Micro 6 With these images, provisioning takes: • ~1 hour when using Calico • ~30–40 minutes when using Flannel
t
Hi.. • I prefer Rocky • 5-10 minutes. I would check to see how long it takes for a VM to spin up. versus the rke2 scripts.
❤️ 1
w
That sounds a bit long, were using OpenSuse leap micro 6.1, the last provision log shows about 30 mins per node - this is on NVME/10GiB network, Network using Calico. I'll give flannel a go next time we bring one up... We did use Rocky first time but found that slower and less flexible, as were using an NFS storage the Suse image came preconfigured with a lot of what we needed out the box and seemed to provision in half the time. We have used and still use Rocky for VM's, since this is a Kubernetes cluster the OS has no bearing on the workloads you run within it, you want a light weight OS with minimal features and the basics for workloads. Leap Micro's storage requirements are 1/5 the size of a Rocky install and that should result in faster node creation. What type of storage are you running? And how have you setup the network?
f
Im using harvester longhorn with HDDs unfortunately
I added 1 ssd per server and installed harvester then i added HDDs and configured to be used by longhorn
So all vms are running in HDD but i dont think thats the issue
Since the vms gets created in 2 min and +-2 minutes to finish the cloudinit
I dont think thats an issue
w
Ahh - thats your slowdown right there, remember your disk images have to copy over, then installs etc download packages - HDD is really very slow and if your doing it right (with replicas over 3 nodes min) thats a lot of data to shunt about to get a node up... Grab yourself some cheep 1TiB SSD's... Oh - if you have SSD's then just setup another storage type - you'll need to download the image to the same format - then you can seed them on SSD - it will make a big difference. We're using NVME - there is quite a bit of IO going on while these are setup and RKE2 needs high IOPS to work properly. Thats my experience anyway.
Would love to be wrong as we're 20TiB of HDD in each machine....
Can you see your SSD's listed in the longhorn volumes?
f
No
Unfortunately
w
if so - you could try tagging them so you can setup a storage class for them...
f
They are only in os i chose them as os and hdd as data volume when i set them up
Im not really directly replicating
But the strange part is that with ubuntu
The cluster comes up in 15-20 minutes
w
you sure they are not listed - were they small SSD's - we used 1TiB disks and we ended up with about 600GiB that we could provision to.
f
And in 40 minutes a 4 node cluster 3 controleplane 1 worker
They are 800 gb hp sas SSD
w
Something sounds strange there - is it the image your using, maybe use a newer version of Leap - we used micro 6.1. Ubuntu is a good fallback - but again like rocky its quite a lot of OS for what is just orchestration.
Mind you it would be using a local image wouldnt it...
I wonder if its updating itself during isntall
since thats an old OS - and in SUSE it would need to create another snapshot and restart to apply updates.
Anyways - let me know if you figure it out - thought I'd share my experience as our OS choice is very similar, just one minor version newer.
❤️ 1
f
Thanks a lot
I will let you know if i figure it out
I have also other issues
For example for some reason clusters go offline every day for 1-3 hours
Kubelet not responding
And i think is related to CCM
But its making me doubt the choice of harveter
w
We did struggle early on, I think you need a good network - we use 2x10GiB SPF+ just for management/storage and have configured the storage with its own dedicated network on a VLAN - until we did that we did get lots of issues with stability.
f
That’s my point too
w
I'm also close to getting rid of it - but we do have some VM requirements still... if we didn't I'd wipe the harvester cluster and go bare metal with ceph - but harvester is good at running mixed environments when you get the setup worked out, i.e. if you want to run VM's and RKE2 clusters at the same time.
f
Ye thats my requirement too
w
That is the one thing thats very good - particularly if your trying to get into k8s and want to experiment - it does make it easier but there is an overhead and you will need to make sure network and storage is up to the job.
p
^ this is why I ❤️ single node harvester. imo makes a lot of sense for good % of use cases
f
And then single point of failure
For example (im running my startup in there)
1 small issue and everything gone forever
p
yeah but the point is the cluster is already effectively a single point of failure. so if the cluster fails you should be able to migrate your traffic to a new cluster quick and have one ready
it won't be gone forever if you do backups/etc properly
f
Ye ye i get u
Actually when harvester came out i thought it would be somthing like openstack with proper distribution like (hypothetically: 1 node storage 1 cpu 1 memory)
It would make much more sense to have 10Gbps required
p
working backwards from first principles 10GBps shouldn't be at all required, but wanting snowflake VMs on distributed block storage kinda do be that way. I just think its inefficient and causes more problems than it generally solves, but "it's HA"
w
If your running 3 replicas and spread your cluster over enough nodes it does mitigate failure nicely. Network needs to be up to the job tho. I’ve an aggregation switch sfp+ that would do a small cluster uk based tho - you can pickup 10g sfp cards quite cheep but they run hot so get a fan pointing at them in the chassis
p
yeah but ime actual hardware failures are quite rare, and if you can restore from live backup in <15mins once every x years you're fine
you can in practice even just have a "hot standby" ready to flip to
but e.g. our main use case would be postgres DBs, and it makes more sense to just rely on postgres replication for that stuff. its 1000x more efficient and built for purpose. thats truly of a lot of the underlying stuff you'd be putting on longhorn on a VM block storage to replicate and fill the pipes
w
Well I’ve been doing this for 25 years now and seen a few servers go wrong in my time - most commonly HDDs and SSDs giving up and having some automation or fallback built in if you can afford it does give piece of mind. Also just being able to pull a node out without killing things to change things around - add components etc is great when it works.
p
yeah, like we effectively do realtime backup of the data that matters. replicate-the-vm is .... a way to accomplish that, but it imposes a lot of costs and imo should be more of a last resort for legacy projects w/ real snowflake VM
w
Well - kinda the same thing if your volume is relocated already, however we’re almost completely in containerised workloads now - as volumes are smaller fast replication is much more efficient. Anyway putting that to one side was sorely tempted to try out an arm server - you can get 98core nodes now - however back to single point! However in this instance it’s 20 mins to provision replacement - depending on your use case …. We still want HA however some dns magic and as second site …
p
if the volumes are small enough 1Gbps should be enough 🫣
like "is data I need to have replicated/backed up being written faster than 1GBps" ... if it was you'd probably already have better NICs
f
I think the 10Gbps should be only at scale when you have 5+ nodes and you really need to have etcd synced
So you can have full replication
But considering that everything in harvester is an overlay on top of K8s, kinda makes sense to require some speed but 10Gbps and all SSD’s for homelabs or startups its too much
They push on using ARM architecture (all fine with it) but they must push to this optimisations also IMO
Make it possible for the broke ones 😜
p
I don't 100% trust my ability to recover longhorn volumes anyway. especially if the goal is < 15 mins downtime vs. begging for help on slack 🤪
w
Thats another reason why when I'm done with VM's were moving to ceph/baremetal k8s managed by rancher.... but until then harvester is a handy tool.... having said that "boxes" was pretty cool - and just reserving a machine for that gives us the VM capacity we need.... we've cluster has 5 in total and we've almost enough kit for another machine... Also - for those just trying it out and experimenting with k8s rancher-desktop is also very good to test k8s on a local machine - you don't actually need a physical cluster, single node is another way to go if you want to dedicate a box and just try things out without having to worry about networking.
p
I like VMs as an encapsulation/isolation/security mechanism, I just think they should be deployed more like containers. i.e. you wouldn't do replicated block storage for a container's ephemeral disk
w
Technically you can do that already - but it would prevent live migration - simply have the images with single replicas and mount additional disks with replicas for your storage. I did similar to this early on and realised it was a bit of a mistake come upgrade day - pretty sure single replicas also mean that they have to be cloned to another node while a node is being upgraded, though you might be able to update the policy there - you can certainly tell it a VM must be shutdown during upgrade. Containerised workloads do replicate the disk image layers too, but its handled more efficiently and they are usually much smaller (though I've seen some beastly containers....)
p
right, the harvester live migrate won't work, but ideally you can just spin up identical VM and attach migrated storage w/ a few lines of code. ymmv
w
For us would be having quite a few VM's running, remembering which need special treatment, depends I guess how your managing it. Lets face it upgrades feel a bit of a gamble, but the fewer edge cases you have the easier it is to keep in step with upstream.
Reminds me of the old days and vagrant...
p
yeah the easiest time to prevent edge cases is when you're starting. I get the migrating legacy workload issue for sure