Hi everyone, it's been several days that I am stru...
# elemental
l
Hi everyone, it's been several days that I am struggling to setup an elemental cluster, I am using the official rancher server helm chart for
2.11.2
, with elemental plugin. Then I generate an ISO for the bare-metal _*"SL Micro6.1 ISO v2.2.0-4.-linux/86_64"*_. I boot it on hyperV (8GB RAM, 80GB disk) with UEFI, secure, TPM activated. The VM installation starts, the machine is visible in the "_*Inventory of Machines*_", then after the auto-restart everything seems fine beside one error logs:
[FAILED] Failed to start elemental health check
(is it relevant ?) mixed with plenty of
OK
logs. Then I try to "Create Elemental Cluster" on the machine, plenty of logs are printed but then it stops, and the VM prints "`unable to decode an event from the watch stream: INTERNAL_ERROR: received from peer`", and the rancher UI says: "`waiting for cluster agent to connect`". And the cluster stays stuck in "`Updating`" ... I tried several combinations of OS and cluster versions (RKE2, K3S), but the issue is still the same, does someone have an idea ?
m
A few things, please post log not in screenshots. What does the elemental-system-agent logs show?
journalctl -u elemental-system-agent
The second log screenshot error for secret too old can be 2 things: either the timezone is way off from your rancher server (keep all servers on utc) OR the same vm was used on a previous install and rancher directories were not cleaned up. Did you run the rke2-uninstall.sh or k3s-uninstall.sh script?
l
Dear @mysterious-animal-29850 thank you very much for your help ! Sorry for the screenshots but unfortunately I cannot copy past logs since it's inside a VM, it's difficult to retrieve them. As you proposed I tried by keeping the timezone in UTC, but the problem is still there ... When I run the command in the VM:
journalctl -u elemental-system-agent
It then ask me for a
password:
, but I have no idea about what it can be. The ISO is generated with the "Registration URL with registration token", I never setup a password at any moment. About the 2nd option "the same vm was used on a previous install and rancher directories were not cleaned up" I am not sure to understand what you mean, I am doing the ISO installation on a complete factory new VM for this experiment, the VM virtual-hard-drive was generated at the moment the VM starts. And every time I try something I delete everything and restart the ISO installation from scratch. For "Did you run the rke2-uninstall.sh or k3s-uninstall.sh script?" Also here I am not sure to understand what you mean, when I delete a VM I destroy the VM completely in hyperV, and from rancher UI I click "Delete" for the cluster and the machine in the "Clusters" and "Inventory of machines" pages.
m
> It then ask me for a
password:
, but I have no idea about what it can be. The ISO is generated with the "Registration URL with registration token", I never setup a password at any moment. When setting up a registration endpoint, are you setting up any cloud config settings to add a user/passwd? If not, are you doing this with EIB or combustion? I'd recommend using either of those 3 methods to get access to the vm via terminal to troubleshoot. > About the 2nd option > "the same vm was used on a previous install and rancher directories were not cleaned up" > I am not sure to understand what you mean, I am doing the ISO installation on a complete factory new VM for this experiment, the VM virtual-hard-drive was generated at the moment the VM starts. And every time I try something I delete everything and restart the ISO installation from scratch. Ignore this question and the *uninstall.sh script, your answer confirms your not reusing the same VM but making new ones. Thanks!
Can you share your registration endpoint yaml? Please remove any sensitive information from it before posting. Thanks!
l
Sure not problem, it's a testing instance of rancher in a dedicated eks cluster for debug, nothing sensitive at all. I can even give you the rancher creds if you want.
Copy code
apiVersion: <http://elemental.cattle.io/v1beta1|elemental.cattle.io/v1beta1>
kind: MachineRegistration
metadata:
  creationTimestamp: '2025-05-30T13:38:00Z'
  generation: 1
  managedFields:
    - apiVersion: <http://elemental.cattle.io/v1beta1|elemental.cattle.io/v1beta1>
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:config:
            .: {}
            f:cloud-config:
              .: {}
              f:users: {}
            f:elemental:
              .: {}
              f:install:
                .: {}
                f:device-selector: {}
                f:reboot: {}
                f:snapshotter:
                  .: {}
                  f:type: {}
              f:reset:
                .: {}
                f:reboot: {}
                f:reset-oem: {}
                f:reset-persistent: {}
          f:machineInventoryAnnotations:
            .: {}
            f:dev: {}
          f:machineInventoryLabels:
            .: {}
            f:dev: {}
      manager: rancher
      operation: Update
      time: '2025-05-30T13:38:00Z'
    - apiVersion: <http://elemental.cattle.io/v1beta1|elemental.cattle.io/v1beta1>
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:conditions: {}
          f:registrationToken: {}
          f:registrationURL: {}
          f:serviceAccountRef: {}
      manager: elemental-operator
      operation: Update
      subresource: status
      time: '2025-05-31T06:50:13Z'
  name: test
  namespace: fleet-default
  resourceVersion: '1534192'
  uid: ec4c9b61-e54b-4eff-a006-979071b17c42
spec:
  config:
    cloud-config:
      users:
        - name: root
          passwd: root
    elemental:
      install:
        device-selector:
          - key: Name
            operator: In
            values:
              - /dev/sda
              - /dev/vda
              - /dev/nvme0
          - key: Size
            operator: Gt
            values:
              - 25Gi
        reboot: true
        snapshotter:
          type: btrfs
      reset:
        reboot: true
        reset-oem: true
        reset-persistent: true
  machineInventoryAnnotations:
    dev: test
  machineInventoryLabels:
    dev: test
status:
  conditions:
    - lastTransitionTime: '2025-05-31T06:50:13Z'
      message: ''
      reason: SuccessfullyCreated
      status: 'True'
      type: Ready
  registrationToken: 4dbqmm5zjvhvqvn5bk55srgcnj6wdwkf9fxdqd269cbt62jh6f8w59
  registrationURL: >-
    <https://rancher.org.cloud.stemys.ch/elemental/registration/4dbqmm5zjvhvqvn5bk55srgcnj6wdwkf9fxdqd269cbt62jh6f8w59>
  serviceAccountRef:
    kind: ServiceAccount
    name: test
    namespace: fleet-default
When setting up a registration endpoint, are you setting up any cloud config settings to add a user/passwd? If not, are you doing this with EIB or combustion? I'd recommend using either of those 3 methods to get access to the vm via terminal to troubleshoot.
When creating the "_*registration endpoint*_" I did the naive way, by clicking create from the UI, and then just enter a name without modifying the default proposed yaml.
By the way I am following these tutorials:

https://www.youtube.com/watch?v=-uenjgsxI5U

https://elemental.docs.rancher.com/