This message was deleted.
# rancher-desktop
a
This message was deleted.
r
Do you have
Administrative Access
checked in under Preferences > Application > General? https://docs.rancherdesktop.io/ui/preferences/application/general#administrative-access
b
I do - this is what causes RD to ask for a password. But consequent starts in the same (I presume) macos login session, it does not ask again.
When I run out of patience, quit it and start using rdctl, like this:
Copy code
rdctl start --application.admin-access=false
it does not start, either
Factory reset is not really an option. I'll try logout/login and reboot of the former doesn't work. But there must be a better way to fix it. Like deleting some file, perhaps? Having a force checkbox in the UI, or a .force flag in rdctl start?
Sonoma 14.2 just came out FYI
This doesn't seem to work, either:
Copy code
[oleg:~] $ rdctl set --application.admin-access=true
Status: UI is currently busy, but will eventually be reconfigured to apply requested changes.
[oleg:~] $
k
Is this on an M1 or Intel Mac?
b
M1
f
Rancher Desktop is working find on my Sonoma 14.2 machine on M1, so I don't think that is the problem.
b
Looks like autostart in background combined with macOS upgrade is the problem
f
Rancher Desktop will not ask for the
sudo
password if it doesn't need to make any changes, that means that
/var/run/docker.sock
already exists, that
/etc/sudoers.d/zzzzz-rancher-desktop-lima
exists and has the right content, and maybe one other thing I don't remember right now.
k
So if you start manually it comes up without that problem?
b
If I forgot to mention, I’m using containerd
f
You can force it to ask for your password by e.g. deleting the sudoers.d/zzzz* file, or the docker socket, but I doubt this will make any difference
b
Let me try
f
The bridged network not getting an IP address is completely outside the control of Rancher Desktop. The interface is created by Apple's VMNET framework, and the IP address is generated by the DHCP server on your local network.
Some networks, especially wireless networks don't like to support multiple IP addresses on a single MAC, so the issue can be your local network setup.
Try to use a wired connection to see if that makes a difference.
Or try rebooting the machine, as a desperate measure πŸ™‚
b
and the IP address is generated by the DHCP server on your local network.
as it should. I'm on a wired network I control, and until I upgraded to 14.2 it was working as expected
f
Now that I think about it, I haven't been using admin mode on the 14.2 box; let me quickly check that
b
Removing the sudoers file helped - I got asked and answered. Starting backend now, waiting
f
Yes, I get the bridged IP address on 14.2
b
It's nice if you want to connect from another comp, without fiddling with ingress
f
I know, but most people actually don't need it. And if you are using
moby
it means you get asked for your password every time the machine reboots because that deletes all of
/var/run
. So admin mode is no longer the default.
b
hmmm... it's still starting backend, over 4 minutes now. I'd say it's bit long wait for M1 Ultra πŸ™‚
f
It is; something is wrong. I would do the reboot, and if that doesn't help, it's time to dive into the log files...
I didn't expect that asking for the password would help; I suspect you still don't have a bridged IP address, but Rancher Desktop should work even without it.
b
every time the machine reboots
macOS, 21st century. Who would ever reboot it πŸ˜ƒ
So admin mode is no longer the default.
we should accommodate this - we're using linkerd with CNI, so in theory it does not need root access, does it? How does one get IP address for ingress incoming endpoints?
I would do the reboot,
been over 7 minutes, rebooting
Turned off start at login, just in case
f
I like to reboot computers all the time; avoids so many problems. It is the basic idea behind Kubernetes: Turning things off and back on again, as a service. πŸ˜›
b
Turning things off and back on again
arguably, this is not equivalent of a reboot - container is just an image, takes milliseconds to start/stop, but OS is a bit(!) different
f
Pod is unhealthy: kill it and it will be redeployed. Node has poor performance: evacuate and replace
Anything that uses heaps and garbage collection has by definition somewhat unpredictable runtime behaviour and benefits from being reset to a known state.
I know people are proud of their 4 years of uptime, but to me that just shows that they don't care about fixing vulnerabilities. Resilient systems can deal with parts going frequently offline
Anyways, quickly getting off-topic here πŸ˜„
b
Restart didn't help 😞 I should've removed sudoers file before restart - I removed it after, got asked for creds/answered, still sitting on starting backend
f
I don't think the password prompt will help you; need to go into the logs to figure out why it fails. Start with
background.log
...
b
quickly getting off-topic here
that's where all the fun is - difference of opinions, a chance to express your own and disagree with others... fun πŸ™‚
Start with
background.log
...
Kindly remind me where is is?
f
~/Library/Logs/rancher-desktop/
k
~/Library/logs/rancher-desktop
πŸ™‚
Jan got the case right
f
Tab-completion FTW
πŸ‘ 1
Actually, I have a keyboard macro for the first part, so just typed
;logs/raβ‡₯
b
I see 1.3 kb of JSON in background.log - what do I look for?
f
Look for something that looks like an error? πŸ™‚ Sorry, log spelunking has no easy rules
Anything that has a stacktrace/backtrace
b
nope - only weird to see 172.16.x among private networks - nothing else rings a bell
Anything that has a stacktrace/backtrace
no at all
See for yourself: 2023-12-12T224242.435Z: mainEvents settings-update: {"version":10,"application":{"adminAccess":true,"debug":false,"extensions":{"allowed":{"enabled":false,"list":[]},"installed":{}},"pathManagementStrategy":"rcfiles","telemetry":{"enabled":true},"updater":{"enabled":true},"autoStart":false,"startInBackground":true,"hideNotificationIcon":false,"window":{"quitOnClose":false}},"containerEngine":{"allowedImages":{"enabled":false,"patterns":[]},"name":"containerd"},"virtualMachine":{"memoryInGB":8,"numberCPUs":8,"hostResolver":true},"WSL":{"integrations":{}},"kubernetes":{"version":"1.27.7","port":6443,"enabled":true,"options":{"traefik":false,"flannel":true},"ingress":{"localhostOnly":false}},"portForwarding":{"includeKubernetesServices":false},"images":{"showAll":true,"namespace":"k8s.io"},"diagnostics":{"showMuted":false,"mutedChecks":{}},"experimental":{"virtualMachine":{"type":"vz","useRosetta":true,"socketVMNet":false,"mount":{"type":"virtiofs","9p":{"securityModel":"none","protocolVersion":"9p2000.L","msizeInKib":128,"cacheMode":"mmap"}},"networkingTunnel":false,"proxy":{"enabled":false,"address":"","password":"","port":3128,"username":"","noproxy":["0.0.0.0/8","10.0.0.0/8","127.0.0.0/8","169.254.0.0/16","172.16.0.0/12","192.168.0.0/16","224.0.0.0/4","240.0.0.0/4"]}}}}
f
Is the VM running; does
rdctl shell
work?
b
Yup, I'm in
Copy code
This does look like a problem:
lima-rancher-desktop:/Users/oleg$ nerdctl ps
FATA[0000] cannot access containerd socket "/run/k3s/containerd/containerd.sock": no such file or directory
lima-rancher-desktop:/Users/oleg$
f
Run
sudo rc-status
and check which services are running
Should be something like this, except you should have
containerd
instead of `docker`:
Copy code
lima-rancher-desktop:~# rc-status
Runlevel: default
 lima-overlay                                                        [  started  ]
 sshd                                                                [  started  ]
 procfs                                                              [  started  ]
 lima-init-local                                                     [  started  ]
 networking                                                          [  started  ]
 qemu-binfmt                                                         [  started  ]
 lima-init                                                           [  started  ]
 crond                                                               [  started  ]
 udev-postmount                                                      [  started  ]
 lima-guestagent                                         [  started 01:22:01 (0) ]
 acpid                                                               [  started  ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
 sysfs                                                               [  started  ]
 fsck                                                                [  started  ]
 root                                                                [  started  ]
 cgroups                                                             [  started  ]
 localmount                                                          [  started  ]
 docker                                                  [  started 01:21:45 (0) ]
 udev-settle                                                         [  started  ]
Dynamic Runlevel: manual
 k3s                                                     [  started 01:21:24 (0) ]
b
lima-rancher-desktop:/Users/oleg$ sudo rc-status Runlevel: default lima-overlay [ started ] sshd [ started ] udev-postmount [ started ] lima-init-local [ started ] networking [ started ] procfs [ started ] qemu-binfmt [ stopped ] lima-init [ started ] acpid [ started ] crond [ started ] lima-guestagent [ started 001308 (0) ] Dynamic Runlevel: hotplugged Dynamic Runlevel: needed/wanted sysfs [ started ] fsck [ started ] cgroups [ started ] root [ started ] localmount [ started ] udev-settle [ started ] Dynamic Runlevel: manual containerd [ failed ] buildkitd [ failed ] lima-rancher-desktop:/Users/oleg$
f
So
containerd
failed to start...
b
yup
k
cat /var/log/containerd.log
?
f
Copy code
lima-rancher-desktop:~# rc-service containerd start
 * /var/log/containerd.log: creating file
 * Starting Container Daemon ...
Try this first
b
About the end of containerd.log:
Copy code
time="2023-12-12T22:43:46.227206227Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2023-12-12T22:43:46.227213602Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2023-12-12T22:43:46.227218310Z" level=info msg="metadata content store policy set" policy=shared
panic: freepages: failed to get all reachable pages (page 879: multiple references (stack: [879]))
f
That doesn't sounds good 😞
k
I have
"could not use snapshotter devmapper in metadata plugin
in my log, but my
containerd
is working
b
containerd started, buildkit started, then containerd went back to failed state
k
Looks like the problem is just in that
freepages
panic
πŸ‘ 1
b
containerd failed within 20 seconds
Copy code
I've got the same in containerd.log:
time="2023-12-12T23:01:31.394568901Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2023-12-12T23:01:31.394575526Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2023-12-12T23:01:31.394579151Z" level=info msg="metadata content store policy set" policy=shared
panic: freepages: failed to get all reachable pages (page 879: multiple references (stack: [879]))
f
Ok, just realized that you are using VZ, so doing one more test with VZ and 8 CPUs, 8 GB, to see if anything happens with that config
b
ust realized that you are using VZ
yup - need Rosetta - we deploy x86_64 images, so we need to test them
k
Googling shows the freepages error is associated with rancher, k3s, and etcd.
Some hits suggest a corrupt
etcd
, but I'm not running
etcd
in my VM.
b
Why I'm not surprized πŸ™‚
k
One comment: out-of-memory or out-of-diskspace...
b
sudo rc-status does not show etcd
and guys, I have 128GB RAM in this box - not that it'd matter for the VM, but FYI
One comment: out-of-memory or out-of-diskspace
for the VM, I wouldn't know - but for the host, I have plenty of both
f
It is some kind of database corruption; I just don't know what the database is. I think the error comes from some bbolt library
b
Ah yes we can see it from the logs
b
BTW, before I forget - it doesn't start without admin access enabled, either :(
f
I'm out of time now, but it looks like something inside the VM is corrupted; I'm going to blame your linkerd experiments; they may have broken some containerd config?
b
I'm going to blame your linkerd experiments; they may have broken some containerd config?
unlikely - we used it for quite a while, with varying success - and it does install some binaries directly on the nodes - like CNI plugin. I'll do factory reset and see if it fixes RD - running out of time myself, need to make progress
f
Good luck! Let me know if you figure it out!
b
Thank you guys for your help! @fast-garage-66093 @kind-iron-72902!
f
Note that you can always create a snapshot of the current setup, if you want to be able to take another look later
But you have to do it before the factory-reset πŸ˜„
b
> Let me know if you figure it out if only by accident - need to progress with the current projects 😞