https://rancher.com/ logo
#elemental
Title
# elemental
a

adamant-kite-43734

01/11/2023, 5:29 PM
This message was deleted.
r

ripe-mechanic-63260

01/11/2023, 6:01 PM
This was reported today as well, I believe this is due to an old iso file with a newer operator?
s

strong-shoe-72392

01/11/2023, 6:13 PM
Thanks! Yes, this worked before but I was on Elemental Operator 1.0.2 then.
I'm assuming a newer teal base ISO will be produced soon with the newer elemental-register?
r

ripe-mechanic-63260

01/11/2023, 6:17 PM
No idea, maybe @many-tiger-3407 can answer this tomorrow during working hours, as he is probably out by now :)
βœ… 1
m

many-tiger-3407

01/12/2023, 8:04 AM
Hey @strong-shoe-72392 πŸ‘‹ yep, 1.1.x operator doesn't work well with 1.0.x ISOs, so you need an updated ISO. If you prepared the ISO with the elemental-iso-add-registration script, you can specify which ISO to use with the REPO env variable. So, to get the very latest image we built, something like:
REPO=Dev ./elemental-iso-add-registration reg.yaml
otherwise the elemental-iso-add-registration will take the stable image: the stable version of elemental-operator is "still" a 1.0.x
REPO=Staging should be an option too (it should follow the latest release on github, i.e., version 1.1.0)
But now I see the issue: our latest release on github is the 1.1.0, but the ISO we ship as stable from OBS is 1.0.x
workaround for now should be to use:
REPO=Staging ./elemental-iso-add-registration reg.yaml
we will fix this soon
πŸ™Œ 1
s

strong-shoe-72392

01/12/2023, 1:43 PM
Hey, thank you @many-tiger-3407! I played around with the different repos yesterday and tried the ISOs from Dev/Staging (I'm pretty sure) and I think they all behaved the same way for me - cloud-init did not run. I wasn't able to tell what branches the ISOs corresponded to at first glance. I suppose I could always downgrade the elemental-operator version to 1.0.2 again as well, right?
m

many-tiger-3407

01/12/2023, 1:46 PM
that's weird, I'm pretty sure I used the ISO from Dev successfully πŸ€” I'm going to double check the releases we have right now with the elemental-operator 1.1.0. Than I will be back to you here, David. But sure, in the meanwhile you can use an elemental-operator < 1.0.2.
βœ… 1
@strong-shoe-72392: one more question, how did you installed the operator 1.1.0 ?
s

strong-shoe-72392

01/12/2023, 1:55 PM
Copy code
helm -n cattle-elemental-system install --create-namespace elemental-operator <https://github.com/rancher/elemental-operator/releases/download/v1.1.0/elemental-operator-1.1.0.tgz>
πŸ‘ 1
m

many-tiger-3407

01/12/2023, 1:55 PM
I will follow your steps
s

strong-shoe-72392

01/12/2023, 2:44 PM
So, I just repeated my steps with a local Docker Rancher 2.7.0 and the Dev Elemental ISO running in a VirtualBox VM. cloud-init DOES work, but I have the same error I've been seeing when dealing with the Raspberry Pi as well - an unexpected EOF on the websocket communication between elemental-register and elemental-operator.
The operator says
Copy code
time="2023-01-12T14:45:05Z" level=info msg="Incoming HTTP request for /elemental/registration/..."
Thu, Jan 12 2023 9:45:06 am	time="2023-01-12T14:45:06Z" level=info msg="Negotiated protocol version: 5"
Thu, Jan 12 2023 9:45:06 am	time="2023-01-12T14:45:06Z" level=error msg="websocket communication interrupted: unknown message"
The client says the following right after printing out all the system info string
Copy code
level=error msg="failed to register machine inventory: failed to send system data: websocket: close 1006 (abnormal closure): unexpected EOF"
I can confirm that if I drop down to elemental-operator 1.0.2 and the Stable ISO, I get past machine registration and the rancher-system-agent gets stuck on appending the ca-cert to kube-scheduler, but that could be because I'm running a local Rancher Server with just an IP as the Rancher Server URL value - possibly.
m

many-tiger-3407

01/12/2023, 3:13 PM
@strong-shoe-72392 thanks for sharing: fine to use a local ip, try to use as the Rancher URL a sslip.io hostname. Something like: <your_local_ip>.sslip.io
for the error on the websocket I will check
s

strong-shoe-72392

01/12/2023, 3:58 PM
Switched over to using sslip.io (just changed Rancher Server URL in Global Settings). Still seeing the ca-cert error. Node successfully provisions k3s - can use kubectl on node. Does not join Rancher. Stuck in provisioning with
Non-ready bootstrap machine(s) elemental-demo-vb-cluster-elemental-demo-mgw-vb-pool1-5786l6s9w and join url to be available on bootstrap node
Not sure if I need to recreate Rancher from the beginning with the correct URL verses changing it in the Global Settings. I have successfully gotten a node to register, but it was a few weeks back on an older ISO.
m

many-tiger-3407

01/13/2023, 10:45 AM
Hey @strong-shoe-72392 πŸ‘‹ So, haven't checked on the 1.0.x version yet, I think updating the URL in Rancher should be enough but don't know... guess you need cert-manager installed if you haven't. But take that just as wild guessing. On the 1.1.0 version, the ISO from the Staging REPO works fine: all went well, the node was provisioned correctly and registered on Rancher. But the ISO on the Dev repo doesn't work 😞I reproduced the bug you experienced, on Dev we have extended the registration protocol and this is the first bug on the interoperability... Will work on it. So, thanks for the feedback David, you made us spot some unclear config and unveiled an interoperability bug πŸ™Œ. Here what we'll do : 1. Clearly document on github how to go with github latest chart and ISO (basically use the Staging ISO, which is coming from the identical version of the released chart on github). That should always work. 2. Work on the interoperability bug on the Dev ISO Will update further here btw, just for information.
πŸ™Œ 1
s

strong-shoe-72392

01/13/2023, 1:49 PM
Hey, thanks so much @many-tiger-3407! I appreciate you getting back to me. That's great. I'll try again on elemental-operator 1.1.0 and Staging ISO. I do NOT have cert-manager installed currently for local testing (but I did in our cloud cluster where the node did provision correctly a week ago). I just did a quick local Docker container deploy of latest Rancher Server/MCM to test. But, I'll do a true k8s cluster deploy with cert-manager + Rancher next time.
I'm curious if the interoperability issue you mentioned is the same issue I was seeing following the Raspberry Pi Getting Started steps too.
The error messages were nearly identical - see https://rancher-users.slack.com/archives/C028DVCAYLD/p1672959122230499
m

many-tiger-3407

01/13/2023, 1:52 PM
Yep, it is the same
πŸ‘ 1
The issue is that the rpi sends data that the operator (1.1.0) doesn't understand
s

strong-shoe-72392

01/13/2023, 1:54 PM
Thanks! That's what I was assuming. Seemed like the System Data object that was getting sent over wasn't getting marshaled/unmarshaled correctly or something like that.
πŸ‘ 1
m

many-tiger-3407

01/13/2023, 3:27 PM
Well, the elemental-operator did not recognized the MSG coming from the client (there is a custom protocol there) and decided to close the websocket. We just added some more info: now the operator should send back the error to the client before closing, but it is still not released πŸ˜…
πŸ™Œ 1
Just for info, we (well, @sticky-tailor-45974 did πŸ™ ) released a new elemental-operator version on github: https://github.com/rancher/elemental-operator/releases/tag/v1.1.1. As said, the latest released github version corresponds to the OBS Staging repo (we keep as Stable default an older, more tested version in OBS). This time we provided a small note about what image to use (i.e., Staging) in the Release Notes to allow playing around without issues. We will fix interoperability of different versions of the ISO/image with the operator, so to try to avoid these kind of issues in the future. Thanks for all your help @strong-shoe-72392 πŸ™‚
rancher employee 1
πŸ™Œ 1
s

strong-shoe-72392

01/13/2023, 4:55 PM
Awesome! Thanks for the really quick turnaround on these! I can't wait to give it another try.
πŸ™ 1
@many-tiger-3407 Just wanted to let you know I finally had a chance to try again with latest 1.1.1 elemental-operator and updated Staging image, and I was able to successfully register a Raspberry Pi node with the cluster! Thanks!
πŸŽ‰ 1
m

many-tiger-3407

01/20/2023, 8:21 AM
Many thanks @strong-shoe-72392 for letting me know!
just for infomrmation: the PR addressing that kind of issue has been merged on the main branch: next elemental-operator release will also be able to correctly deploy nodes (including cloud-config) also with old ISOs
πŸ™Œ 1
54 Views