https://rancher.com/ logo
c

colossal-cartoon-44029

08/22/2022, 10:36 PM
Hey folks, I’m running into a problem with rancher desktop 1.5.0 and 1.5.1 on an M1 Mac and I’m not sure how to troubleshoot. I’m running the following containers with docker, and things just pause for several minutes periodically.
Copy code
40bfe0359924   envoyproxy/ratelimit:master                            "/bin/ratelimit"         36 minutes ago       Up About a minute             0.0.0.0:46070->6070/tcp, :::46070->6070/tcp, 0.0.0.0:48080->8080/tcp, :::48080->8080/tcp, 0.0.0.0:48081->8081/tcp, :::48081->8081/tcp                                           dev-environment-ratelimit-1
67e2dcc091fd   <http://ghcr.io/arm64-compat/confluentinc/cp-server:7.1.1|ghcr.io/arm64-compat/confluentinc/cp-server:7.1.1>      "/etc/confluent/dock…"   About an hour ago    Up About a minute             0.0.0.0:9092-9093->9092-9093/tcp, :::9092-9093->9092-9093/tcp                                                                                                                   dev-environment-kafka-1
827bb975a0a4   rabbitmq:management                                    "docker-entrypoint.s…"   About an hour ago    Up About a minute             4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, :::5672->5672/tcp, 15671/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp, :::15672->15672/tcp                             dev-environment-rabbitmq-1
f94a1a0a6d40   envoyproxy/envoy:v1.21-latest                          "/usr/local/bin/envo…"   About an hour ago    Up About a minute             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:8443->8443/tcp, :::8443->8443/tcp, 10000/tcp, 0.0.0.0:20980->8080/tcp, :::20980->8080/tcp, 0.0.0.0:443->8443/tcp, :::443->8443/tcp   dev-environment-proxy-1
7842309ea790   <http://ghcr.io/arm64-compat/confluentinc/cp-zookeeper:7.1.1|ghcr.io/arm64-compat/confluentinc/cp-zookeeper:7.1.1>   "/etc/confluent/dock…"   About an hour ago    Up About a minute (healthy)   2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp                                                                                                                   dev-environment-zookeeper-1
660891e288ed   redis                                                  "docker-entrypoint.s…"   About an hour ago    Up About a minute             0.0.0.0:6379->6379/tcp, :::6379->6379/tcp                                                                                                                                       dev-environment-redis-1
1d6a4a9e61f5   postgres:12.2                                          "docker-entrypoint.s…"   12 days ago          Up About a minute             0.0.0.0:5432->5432/tcp, :::5432->5432/tcp                                                                                                                                       dev-environment-postgres-1
Things come back and operate normally for a bit, and then it pauses again.
qemu-system-aarch64 runs at over 100% while it’s hung, and then drops back down when things start working. One thing I noticed is that the kafka jvm is running with a VSZ% of 90+, I’m dropping that down to see if it helps, but this did not happen with 1.4.X.
q

quick-keyboard-83126

08/23/2022, 1:02 AM
Is there a reason you aren't using a native image for your workload? (qemu is expensive...)
c

colossal-cartoon-44029

08/23/2022, 2:21 AM
Yes
Problem happens for me with envoyproxy/ratelimit:master, envoyproxy/envoy:v1.21-latest, and redis
j

jolly-forest-99711

08/23/2022, 5:28 PM
qemu-system-aarch64 runs at over 100% while it’s hung
What do you mean by this? 100% CPU from the point of view of the host system? Have you tried giving the VM more CPUs and memory? I wonder if there's anything in the logs for envoy, redis etc that would say what is happening during a pause
c

colossal-cartoon-44029

08/23/2022, 7:53 PM
Hi Adam
What do you mean by this? 100% CPU from the point of view of the host system?
Yes. When things are operating smoothly, qemu-system-aarch64 uses around 15% of System CPU as reported by the MacOS Activity Monitor.
Have you tried giving the VM more CPUs and memory?
I upping to 8GB RAM and 4 CPUs and the behavior didn’t change.
I wonder if there’s anything in the logs for envoy, redis etc that would say what is happening during a pause
container logs don’t say anything interesting. A thing I did find is that there are bursts of DNS queries getting logged in lima.ha.stderr.log. I think I’m reproducing a similar problem by doing the following. 1. Run a loop querying dns for host.docker.internal.
docker run --rm --name crashy-crashy -ti ubuntu:20.04 bash -c 'apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y dnsutils psmisc && while   true ; do dig host.docker.internal ; done'
2. Once that’s started, run another loop that kills
dig
over and over within the same container.
docker exec -ti crashy-crashy bash -c "while true ; do killall dig ; sleep .1 ; done"
If I let this sit for a bit, it seems to get qemu into the same state as the redis/envoy/ratelimit combo. I’m going to try this on rancher desktop 1.4 to see if it seems more stable. If 1.4 is more stable, I’ll create a github issue.
These reproduction steps don’t seem to hang 1.4.1. I created https://github.com/rancher-sandbox/rancher-desktop/issues/2811
j

jolly-forest-99711

08/23/2022, 9:58 PM
By any chance, did this problem first start happening after you upgraded to macOS 12.5.1? It seems that this update is messing with a bunch of different parts of RD
c

colossal-cartoon-44029

08/23/2022, 10:00 PM
No, it’s been going on for a while. I tried RD 1.5.0 on macos 12.5.0 and it had this problem, reverted to RD 1.4.1, upgraded my mac, tried RD 1.5.1 and continue to see the issue.
q

quick-keyboard-83126

08/23/2022, 10:00 PM
Probably worth mentioning
126 Views