https://rancher.com/ logo
#elemental
Title
# elemental
s

strong-shoe-72392

01/05/2023, 10:52 PM
Has anyone else seen the following errors when registering a Raspberry Pi 4 k3s cluster (using elemental-operator 1.1.0). The Raspberry Pi boots up and reaches out to Rancher MCM but it seems to fail to establish a websocket connection. elemental-operator pod logs
Copy code
time="2023-01-05T22:41:38Z" level=info msg="Incoming HTTP request for /elemental/registration/9tb...9"
Thu, Jan 5 2023 5:41:39 pm	time="2023-01-05T22:41:39Z" level=info msg="Negotiated protocol version: 5"
Thu, Jan 5 2023 5:41:39 pm	time="2023-01-05T22:41:39Z" level=error msg="websocket communication interrupted: unknown message"
elemental client logs
Copy code
Jan 05 22:46:04 rancher-31733 elemental[1417]: time="2023-01-05T22:46:04Z" level=info msg="Enable TPM emulation"
Jan 05 22:46:04 rancher-31733 elemental[1417]: time="2023-01-05T22:46:04Z" level=info msg="Connect to <https://rancher>.<redacted>"
Jan 05 22:46:05 rancher-31733 elemental[1417]: time="2023-01-05T22:46:05Z" level=info msg="Using TPMHash <redacted> to dial <wss://rancher>.<redacted>/elemental/registration/9tb...9"
Jan 05 22:46:05 rancher-31733 elemental[1417]: time="2023-01-05T22:46:05Z" level=debug msg="Start TPM attestation"
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=info msg="TPM attestation successful"
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=debug msg="elemental-register protocol version: 8"
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=info msg="Negotiated protocol version: 5"
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=info msg="Send SMBIOS data"
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=info msg="Send system data"
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=debug msg="system data:..."
Jan 05 22:46:06 rancher-31733 elemental[1417]: time="2023-01-05T22:46:06Z" level=error msg="failed to register machine inventory: failed to send system data: websocket: close 1006 (abnormal closure): unexpected EOF"
This appears to be the same error that was addressed by https://github.com/rancher/elemental/issues/346 and https://github.com/rancher/elemental-operator/issues/176 Any help is greatly appreciated! Thanks!
c

careful-piano-35019

01/06/2023, 1:47 PM
I assume you're on Rancher 2.7.0 stable version of the elemental operator from
Copy code
<oci://registry.opensuse.org/isv/rancher/elemental/stable/charts/rancher/elemental-operator-chart>
and it's all LAN ? rpis to Rancher
s

strong-shoe-72392

01/06/2023, 1:48 PM
Rancher 2.7.0 - yes Elemental Operator - I was on the stable (1.0.2 I think is what the chart pulled) but I upgraded to 1.1.0 to try as well.
No, it is not all LAN. Pi is on home network connecting to Rancher in AWS (but same configuration worked for VirtualBox VM).
There shouldn't be any firewall or VPNs in the mix.
c

careful-piano-35019

01/06/2023, 1:49 PM
ok, that's a possible scenario but yeah it opens questions about networking / filtering
and you're kubernetes version Rancher is running on ?
https://github.com/rancher/elemental-operator/issues/176 points to isses with Secrets for the ServiceAccount
s

strong-shoe-72392

01/06/2023, 1:51 PM
It's v1.21.12+rke2r2 (needs updated)
c

careful-piano-35019

01/06/2023, 1:51 PM
"*The issue is in the operator when using kubernetes 1.24*."
we generally use in dev / test the latest versions, so it might be that the operator needs a more recent version
although I understand @many-tiger-3407 provided a fix
s

strong-shoe-72392

01/06/2023, 1:52 PM
I can try deploying latest from main branch of elemental-operator
Right - the first issue said the second link was a fix.
c

careful-piano-35019

01/06/2023, 1:52 PM
I'm not saying it's the issue, just trying to guess here
s

strong-shoe-72392

01/06/2023, 1:52 PM
And that should have been in both 1.0.2 and 1.1.0
I think
I can try deploying latest dev/main.
c

careful-piano-35019

01/06/2023, 1:53 PM
if it's not too much a hassle, you can give a try, that will help narrow down the possibilities
s

strong-shoe-72392

01/06/2023, 1:53 PM
If that doesn't work I could also try standing up a local Rancher MCM instance to rule out some networking funniness.
c

careful-piano-35019

01/06/2023, 1:53 PM
but, in general, Elemental being fairly recent, we mostly worked and tested with k8s > 1.24
s

strong-shoe-72392

01/06/2023, 1:54 PM
Gotcha
One thing I did have to do that is non-standard...
The elemental-operator runs as a privileged container.
c

careful-piano-35019

01/06/2023, 1:54 PM
but the issues at first sight look network related
1
s

strong-shoe-72392

01/06/2023, 1:54 PM
We have PSP on cluster that prevents this.
I attached a PSP that allows this to the elemental-operator to allow the pod to startup. But there is no option in the Helm chart for this.
Was thinking of writing a GitHub issue to request.
And/or the new 1.25+ approach
c

careful-piano-35019

01/06/2023, 1:55 PM
I think it's worth a GitHub issue, it soes not look like you're missing something obvious
1
s

strong-shoe-72392

01/06/2023, 1:56 PM
Thanks for your feed back! I'll poke around some more and update here.
👍 1
Attempted both fixes: 1. elemental-operator - upgraded to latest of main and latest container image 2. Installed local Rancher 2.7.0 (docker install/v1.24.4+k3s1), redeployed cluster Yaml files to local Rancher 3. Updated USB registration Yaml and rebooted Raspberry Pi
Same results
I believe it's an error in marshaling/unmarshaling the system data.
c

careful-piano-35019

01/06/2023, 3:25 PM
ok, I'm currently in the process of installing on USFF format x86 systems, but when I'm done, I'll switch back to RPi
I'll try with an AWS cluster actually
1
s

strong-shoe-72392

01/06/2023, 3:27 PM
Thanks! Here's the info of the downloaded rpi.raw ISO I used to make the bootable USB
Copy code
md5sum rpi.raw
a6181a673b70691dfc19a09dfd8c5a68  rpi.raw
@careful-piano-35019 did you have a chance to try the RPi again?
c

careful-piano-35019

01/11/2023, 1:32 PM
not yet, sorry. Possibly Friday
s

strong-shoe-72392

01/11/2023, 1:34 PM
No problem. Thanks again for the help.
@careful-piano-35019 this problem wasn't just a Raspberry Pi problem it would seem. After the changes mentioned in this thread I was able to resolve the issues for both Raspberry Pi and VirtualBox nodes correctly https://rancher-users.slack.com/archives/C028DVCAYLD/p1673458165577919
👍 1
c

careful-piano-35019

01/19/2023, 8:05 PM
All good now ?
s

strong-shoe-72392

01/19/2023, 8:22 PM
Yes, I was able to register my Pi 4 with Rancher!
One quick follow-up question if you don't mind. I did have to do this with a Rancher Server I have deployed in AWS and have a valid public CA because the rancher-agent did not like the self signed certificate.
Am I missing something? Is there an easy way to test with a local Rancher Server (docker deploy or Rancher Desktop k3s cluster) and maybe using sslip.io for the URL that will work instead of needing a public CA?
c

careful-piano-35019

01/19/2023, 10:53 PM
Yes I've done it with a local cluster without a public CA
44 Views