This message was deleted.
# harvester
a
This message was deleted.
g
likely not a master node.. you may want to check from the first node
kubectl get pods -A -o wide
to see what is going on
also
journalctl -fu rancherd
may tell us if something went wrong
followed by
journalctl -fu rke2-agent
b
@great-bear-19718 So, on this new node, rke2 for some inexplicable reason was not installed. The /etc/rancher/rke2/rke2.yml file is not present. Installation did not show any errors.
The master node remains accessible, 100% operational.
The difference between the two installations is that on this node in question, I selected the option to join an existing cluster. The VIP was successfully validated at the network level, everything seems ok in the installation console, which was very fast by the way.
Do I really need to install a 3rd node for the second one to be available? I remember seeing someone on github talking about an old bug in this regard. But the error was different.
g
2nd should become available
check the status of rke2-agent service
b
Rke2-agent is not install 😞
I think that is join mode is a problem πŸ˜žπŸ˜“
@great-bear-19718 this mode recovery, he can repair this instalation? or it's necessary workround?
This latest version Harvester. I must test rc-2?
g
systemctl list-units --type=service
it would be nice to see the output for this to see what is going on there
b
Copy code
UNIT                                   LOAD   ACTIVE     SUB     JOB   DESCRIPTION
  apparmor.service                       loaded active     exited        Load AppArmor profiles
  auditd.service                         loaded active     running       Security Auditing Service
  augenrules.service                     loaded active     exited        auditd rules generation
  boot-sysctl.service                    loaded active     exited        Apply Kernel Variables for 5.14.21-150400.24.60-default from /boot
  cos-setup-boot.service                 loaded active     exited        cOS system configuration
  cos-setup-fs.service                   loaded active     exited        cOS system after FS setup
  dbus.service                           loaded active     running       D-Bus System Message Bus
  detect-part-label-duplicates.service   loaded active     exited        Detect if the system suffers from bsc#1089761
  dracut-shutdown.service                loaded active     exited        Restore /run/initramfs on shutdown
  getty@tty1.service                     loaded active     running       Getty on tty1
  haveged.service                        loaded active     running       Entropy Daemon based on the HAVEGE algorithm
  iscsi.service                          loaded active     exited        Login and scanning of iSCSI devices
  kbdsettings.service                    loaded active     exited        Apply settings from /etc/sysconfig/keyboard
  kmod-static-nodes.service              loaded active     exited        Create List of Static Device Nodes
  lvm2-monitor.service                   loaded active     exited        Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
  rancherd.service                       loaded activating start   start Rancher Bootstrap
  rdma-load-modules@infiniband.service   loaded active     exited        Load RDMA modules from /etc/rdma/modules/infiniband.conf
  rdma-load-modules@rdma.service         loaded active     exited        Load RDMA modules from /etc/rdma/modules/rdma.conf
  rdma-load-modules@roce.service         loaded active     exited        Load RDMA modules from /etc/rdma/modules/roce.conf
  smartd.service                         loaded active     running       Self Monitoring and Reporting Technology (SMART) Daemon
  sshd.service                           loaded active     running       OpenSSH Daemon
  systemd-hwdb-update.service            loaded active     exited        Rebuild Hardware Database
  systemd-journal-catalog-update.service loaded active     exited        Rebuild Journal Catalog
  systemd-journal-flush.service          loaded active     exited        Flush Journal to Persistent Storage
  systemd-journald.service               loaded active     running       Journal Service
  systemd-logind.service                 loaded active     running       User Login Management
  systemd-modules-load.service           loaded active     exited        Load Kernel Modules
  systemd-random-seed.service            loaded active     exited        Load/Save Random Seed
  systemd-remount-fs.service             loaded active     exited        Remount Root and Kernel File Systems
  systemd-sysctl.service                 loaded active     exited        Apply Kernel Variables
  systemd-sysusers.service               loaded active     exited        Create System Users
  systemd-time-wait-sync.service         loaded active     exited        Wait Until Kernel Time Synchronized
  systemd-timesyncd.service              loaded active     running       Network Time Synchronization
  systemd-tmpfiles-setup-dev.service     loaded active     exited        Create Static Device Nodes in /dev
  systemd-tmpfiles-setup.service         loaded active     exited        Create Volatile Files and Directories
  systemd-udev-trigger.service           loaded active     exited        Coldplug All udev Devices
  systemd-udevd.service                  loaded active     running       Rule-based Manager for Device Events and Files
  systemd-update-done.service            loaded active     exited        Update is Completed
  systemd-update-utmp-runlevel.service   loaded inactive   dead    start Record Runlevel Change in UTMP
  systemd-update-utmp.service            loaded active     exited        Record System Boot/Shutdown in UTMP
  systemd-user-sessions.service          loaded active     exited        Permit User Sessions
  user-runtime-dir@0.service             loaded active     exited        User Runtime Directory /run/user/0
  user@0.service                         loaded active     running       User Manager for UID 0
  wicked.service                         loaded active     exited        wicked managed network interfaces
  wickedd-auto4.service                  loaded active     running       wicked AutoIPv4 supplicant service
  wickedd-dhcp4.service                  loaded active     running       wicked DHCPv4 supplicant service
  wickedd-dhcp6.service                  loaded active     running       wicked DHCPv6 supplicant service
  wickedd-nanny.service                  loaded active     running       wicked network nanny service
  wickedd.service                        loaded active     running       wicked network management service daemon
g
journalctl -fu rancherd
?
b
Copy code
Jul 06 03:59:21 vmh-cgh-eqn001p rancherd[3134]: time="2023-07-06T03:59:21Z" level=info msg="[stderr]: time=\"2023-07-06T03:59:21Z\" level=info msg=\"Probe [kubelet] is unhealthy\""
Jul 06 03:59:25 vmh-cgh-eqn001p rancherd[3134]: time="2023-07-06T03:59:25Z" level=info msg="[stderr]: time=\"2023-07-06T03:59:25Z\" level=info msg=\"Probe [kubelet] is healthy\""
Jul 06 03:59:25 vmh-cgh-eqn001p rancherd[3134]: time="2023-07-06T03:59:25Z" level=info msg="[stderr]: time=\"2023-07-06T03:59:25Z\" level=info msg=\"All probes are healthy\""
Jul 06 03:59:25 vmh-cgh-eqn001p rancherd[3134]: time="2023-07-06T03:59:25Z" level=info msg="Successfully Bootstrapped Rancher (v2.6.11/v1.24.11+rke2r1)"
Jul 06 03:59:25 vmh-cgh-eqn001p systemd[1]: rancherd.service: Deactivated successfully.
Jul 06 03:59:25 vmh-cgh-eqn001p systemd[1]: Finished Rancher Bootstrap.
Jul 06 03:59:26 vmh-cgh-eqn001p systemd[1]: Starting Rancher Bootstrap...
Jul 06 03:59:26 vmh-cgh-eqn001p rancherd[8092]: time="2023-07-06T03:59:26Z" level=info msg="System is already bootstrapped. To force the system to be bootstrapped again run with the --force flag"
Jul 06 03:59:26 vmh-cgh-eqn001p systemd[1]: rancherd.service: Deactivated successfully.
Jul 06 03:59:26 vmh-cgh-eqn001p systemd[1]: Finished Rancher Bootstrap.
I discovery this problem!
g
did you re-use this node?
b
Not! For some reason, the /etc/rancher/rancherd/config.yaml file had the dns of my VIP and not the IP (I don't know how to explain it) I changed and restarted the rancherd.service and the node became available in the cluster
However, on this second node, I still can't run the kubectl command, it still looks like it doesn't have the rke2 binaries
Copy code
vmh-cgh-eqn001p:/home/rancher # kubectl get pods -n kube-system
W0706 04:02:12.184752   26687 loader.go:223] Config not found: /etc/rancher/rke2/rke2.yaml
The connection to the server localhost:8080 was refused - did you specify the right host or port?
g
second node will not have the kubeconfig file.. you need to run this from the master node
βœ… 1
kubectl get nodes
on first node and see if it shows up there
b
This two availlable!
g
looks like it joined 8m ago
βœ… 1
πŸ™Œ 1
kubectl get pods -A
b
I altered this file
/etc/rancher/rancherd/config.yaml
manually. If node restart, this conf is a lost?
g
yes
πŸ˜“ 1
g
rancherd will not come into play once rke2 is initialised
b
Ok. How do I make this setting permanent?
g
in
/oem/90_custom.yaml
you will find the config for /etc/rancher/rancherd/config.yaml
πŸ™Œ 1
its a cloud-init.. you need to make changes there and it will persist across reboots
b
Ah. Ok!
I will modify and restart the server to test
Once the modification is made, should I restart some service to make it effective before restarting the server?
g
no service restart is needed
OS will apply changes on reboot
b
Ok!
@great-bear-19718 you is my hero!
Everything worked! Thanks for the strong support!