This message was deleted.
# rancher-desktop
a
This message was deleted.
f
I'm happy to try something... 🙂
q
Gimme a minute, I'm trying to finish a
bun
change
f
I have to review something here right now, so won't be able to try for 20-30min anyways
q
So, the command is basically something like this, to be run in
rdctl shell
where
~/gcloud/google-cloud-sdk
is an installation of
gcloud
and you've done
apk add py3-openssl
or similar, and you've created a
json
file at
$service_account_key
which is a google cloud service account (w/ enough permissions to identify and make cloud storage read operations I suppose....)
Copy code
curl "$(~/gcloud/google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abf>
f2a6d853109e4a948| tail -n 1 |awk '{print $5}')" -D - -o /dev/null
I know that's a lot of prerequisites, sorry...
Err, be very careful about copy paste, each time I try I end up w/ a stupid line wrap in the url which explodes when copied into the shell (!!)
Copy code
curl "$(~/gcloud/google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948%7C> tail -n 1 |awk '{print $5}')" -D - -o /dev/null
Sigh, I hate copy/paste corruption... here's a version where my pipe isn't inexplicably percent encoded:
Copy code
curl "$(~/gcloud/google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092>
cb075baa95784488c0bbc3e0abff2a6d853109e4a948 | tail -n 1 |awk '{print $5}')" -D - -o /dev/null
And here's what it looks like when I run it
Copy code
lima-rancher-desktop:/Users/jsoref$ curl --http1.1 -4 "$(~/gcloud/google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948> | tail -n 1 |awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 200 OK
X-GUploader-UploadID: ADPycdvrNdaWfDDMITCajmstn98LaD4n43bMnn3VPmf4-AIVxmbmr_y-h7MY_KNTngpIRmv_qkJLyk8N6Sh0gNyvwyNAvg
Expires: Wed, 22 Mar 2023 23:14:37 GMT
Date: Wed, 22 Mar 2023 22:14:37 GMT
Cache-Control: public, max-age=3600
Last-Modified: Mon, 06 Mar 2023 23:41:48 GMT
ETag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
Content-Type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
Accept-Ranges: bytes
Content-Length: 9599350
Server: UploadServer
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

 16 9374k   16 1552k    0     0  23989      0  0:06:40  0:01:06  0:05:34  5607
Note that it more or less never finishes... The same equivalent command is fine when not run in the rancher lima vm
f
Unfortunately I don't have a GCloud account ready for testing, so I can't actually run it.
q
20 9374k 20 1930k 0 0 3099 0 05137 01037 04100 272
f
But in the end this is just a
curl
command, so if this is a client-side problem, it should be reproducible with a curl to some other endpoint
q
fwiw, it's still running (and will basically not finish, it'll die when i leave the office)
we don't see this problem w/ curl from outside rancher and we don't see this problem pulling from other container repositories (mostly docker hub, but also others)
f
So I would try to find out if this is a specific networking bottleneck, and if it depends on the endpoint on the otherside or not
q
Right, but,... it works fine from outside rancher w/ the same exact command
I'm also talking w/ google (via our vendor)...
It doesn't matter if i add
-4
or
--http1.1
, i'm not sure what else i can do to get interesting bits out
i could set up some packet traces and compare i suppose
(i'm not a fan of trying to decipher https traffic...)
f
Are you also running plain Lima? If you are, then I would try to start both an Alpine and a Ubuntu VM, and try the same thing from both VMs, to determine if this is an issue with the distro
q
i can run anything someone tells me to run w/in reason
hmm, are you suggesting using
docker run
to run an alpine/ubuntu in the rancher managed thing? or to find the standalone lima and try that?
f
If you don't have Lima, use
brew install lima
. Then
limactl start --tty=false
to create a Ubuntu VM, and
limactl start --tty=false <template://alpine>
to create an Alpine one
q
ok
Copy code
Error: lima HEAD-cce3561 is already installed
To install 0.15.0, first run:
  brew unlink lima
f
Oh, so you already have it, but switching to the latest release is probably a good idea
q
so unlink and install again?
f
Yes
q
🍺
f
🍻
Now you can create the 2 VMs
Then use
lima
to run a shell in the Ubuntu VM, and
limactl shell alpine
to run one in the Alpine one
I don't know if you need to write to your
$HOME
from inside the VM. If you do, then you need to edit the
lima.yaml
before start (don't use
--tty=false
) because the templates will mount it readonly
q
i only need a tiny json that i can paste into the vm
==> Pouring qt--6.4.3.arm64_ventura.bottle.tar.gz
f
Both VMs will have
/tmp/lima
mounted writable
q
mounted from the host?
f
Yes
q
Copy code
% lima --version
limactl version 0.15.0
👍 1
f
I have no idea why it installed
qt
as well...
q
INFO[0019] [hostagent] Waiting for the essential requirement 3 of 5: "sshfs binary to be installed"
INFO[0025] READY. Run
limactl shell alpine
to open the shell.
I really wish that there was an easy way to install
gcloud
w/
apk
f
Good. I would expect you to get similar performance from the Alpine VM as you get from Rancher Desktop; this is just to make sure the issue doesn't come from something else RD is doing.
And then I'm really curious if Ubuntu shows the same behaviour or not
q
ubuntu is still setting up
f
I'll let you repeat your experiments here and get some coffee. Should be back in 10-15min
q
not finishing instantly:
Copy code
lima-alpine:/tmp/google-cloud-sdk$ curl "$(./bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948%7C> ta
il -n 1 |awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0HTTP/2 200
x-guploader-uploadid: ADPycdvTI4bzssRZPSewhhm1teYh9Cb7NziZPUyVDKe2j6ui2vIBI0JbDrKC6FxUlSYehdbsl22xsXQmoxpzoNl11jumVw
expires: Wed, 22 Mar 2023 23:42:49 GMT
date: Wed, 22 Mar 2023 22:42:49 GMT
cache-control: public, max-age=3600
last-modified: Mon, 06 Mar 2023 23:41:48 GMT
etag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
content-type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 9599350
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

  6 9374k    6  575k    0     0  32581      0  0:04:54  0:00:18  0:04:36 24920
INFO[0419] [hostagent] Waiting for the essential requirement 3 of 5: "sshfs binary to be installed"
ubuntu is still twiddling its thumbs
INFO[0579] [hostagent] Waiting for the essential requirement 3 of 5: "sshfs binary to be installed"
FATA[0600] did not receive an event with the "running" status
Copy code
jsoref@lima-default:/Users/jsoref$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 21.10
Release:	21.10
Codename:	impish
um, i was able to copy stuff to/from /tmp/lima from the alpine shell, but the ubuntu shell doesn't see any files in /tmp/lima ...
f
Weird. Did the Ubuntu install eventually finish?
q
dunno, it was eventually running
f
I think it means
sshfs
was not yet installed. Do you see the files from your home directory? I would expect that to be missing as well then
q
also missing
thankfully none of that is necessary for this adventure
f
Maybe try
limactl stop; limactl start
to see if it fixes itself?
q
Copy code
jsoref@lima-default:/tmp$ ./google-cloud-sdk/bin/gsutil signurl x <gs://foo>
Keystore password:
well gee, that's helpful ...
I have no idea who is asking for a keystore password or why, but i know I don't like it 😞
f
Must be
gsutil
, no?
q
well, sure, but it didn't do that on macOS, nor in rdctl nor in lima-alpine
why now?
f
I have no idea
q
I'm w/ the original reporter of that item
note: the file
x
was an empty file
it would have been better for it to say "there doesn't appear to be anything signed here,..."
Copy code
jsoref@lima-default:/tmp$ curl "$(google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948%7C> tail -n 1 | awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0HTTP/2 200
x-guploader-uploadid: ADPycdvXfd5Ls6eGM-shkeomb6v7Jl1wM-SLroVEfTj5fe2UoDjG3n-FvhDq8nXeoNZA-L4ECU6yxpvsocpdF_RoxlWe0A
expires: Wed, 22 Mar 2023 23:59:39 GMT
date: Wed, 22 Mar 2023 22:59:39 GMT
cache-control: public, max-age=3600
last-modified: Mon, 06 Mar 2023 23:41:48 GMT
etag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
content-type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 9599350
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

  7 9374k    7  668k    0     0  48877      0  0:03:16  0:00:14  0:03:02 65298
ok, lima-default(=ubuntu) is behaving like lima-alpine
f
which is behaving like lima-rancher-desktop, right?
q
right
I can try
UTM
...
f
something something MTU
q
hmm, I'd buy MTU
lemme see what UTM has to say
f
I remember @wide-mechanic-33041 talked about having to adjust the MTU settings in some cases to get the network to work properly, but I can't remember any details
q
UTM says "here's an ubuntu, I hope you remember its credentials". Me. "Um, that's a negative"
f
Did you try to change MTU to see if that makes a difference?
q
Don't know how to 🙂
f
E.g. try something quite low, like
ifconfig eth0 mtu 1000
and see if that changes anything?
q
Copy code
jsoref@lima-default:/tmp$ sudo ifconfig eth0 mtu 1000
sudo: ifconfig: command not found
🙂
lemme google...
f
Try it with RD or Alpine first
we kind of have established that the issue is not the distro, so we can stop testing on Ubuntu, I think
q
Copy code
lima-alpine:/Users/jsoref$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:55:55:af:99:94 brd ff:ff:ff:ff:ff:ff
lima-alpine:/Users/jsoref$ sudo ip link set dev eth0 mtu 1000
lima-alpine:/Users/jsoref$ ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1000 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:55:55:af:99:94 brd ff:ff:ff:ff:ff:ff
f
Ok, now do the
curl
again
q
Copy code
lima-alpine:/Users/jsoref$ curl "$(./bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948%7C> tail -n 1
|awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0HTTP/2 200
x-guploader-uploadid: ADPycdtCZKWf_CGfTS9UMNFWe7Ust2QIL5dLccfCBGpim5F3FJoRqUExQqJ6XeG5LqbBhtm79So8RhOBk8XiaSYNyTwQntpvJUR8
expires: Thu, 23 Mar 2023 00:11:00 GMT
date: Wed, 22 Mar 2023 23:11:00 GMT
cache-control: public, max-age=3600
last-modified: Mon, 06 Mar 2023 23:41:48 GMT
etag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
content-type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 9599350
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

  5 9374k    5  520k    0     0  36373      0  0:04:23  0:00:14  0:04:09 32221
doesn't seem to be working
f
Well, it was worth a try
q
Sure
f
Unfortunately Nino is sick today; otherwise I would have asked him for ideas
q
this has been a problem for months
i have until the end of next week to poke things
otherwise, i'll have to revisit in the second half of april
f
Maybe wait until tomorrow, to see if Nino is better
Ping him and/or me tomorrow as a reminder!
1
q
This is UTM (w/ qemu):
Copy code
ubuntu@ubuntu-utm:~$ uname -a
Linux ubuntu-utm 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@ubuntu-utm:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.1 LTS
Release:	22.04
Codename:	jammy 
ubuntu@ubuntu-utm:~$ curl -4 "$(./google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948%7C> tail -n 1 | awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/2 200
x-guploader-uploadid: ADPycdsXGbECQZIzMFeiBbbaPUgYeKArc7h0bCakwfQPW4LnVDnoxKPs73dr5pmfQMKlcwMuT5JyhK_wofUz61lvVd9e1eoPdXRD
expires: Thu, 23 Mar 2023 00:45:10 GMT
date: Wed, 22 Mar 2023 23:45:10 GMT
cache-control: public, max-age=3600
last-modified: Mon, 06 Mar 2023 23:41:48 GMT
etag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
content-type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 9599350
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

 15 9374k   15 1470k    0     0  14641      0  0:10:55  0:01:42  0:09:13   817
(I tried using UTM w/ Apple's emulator and the networking driver oopsed during the install ...)
f
So you get expected speed?
q
It hasn't finished. So it depends on what you're expecting 🙂
This is what should happen (macOS bare):
Copy code
jsoref@jsoref-mbp ~ % time curl "$(CLOUDSDK_PYTHON=~/.pyenv/versions/3.10.1/bin/python ./bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95784488c0bbc3e0abff2a6d853109e4a948%7C> tail -n 1 |awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/2 200
x-guploader-uploadid: ADPycdsX26G5uYvSflNNM0FxgC4zhVvq1TsFagqZFQoE38QGV3C3vY7U2YUVeVBbKE9lEd-ofDENOMipnUkvW9K0aXqnrw
expires: Thu, 23 Mar 2023 00:46:40 GMT
date: Wed, 22 Mar 2023 23:46:40 GMT
cache-control: public, max-age=3600
last-modified: Mon, 06 Mar 2023 23:41:48 GMT
etag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
content-type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 9599350
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

100 9374k  100 9374k    0     0  7167k      0  0:00:01  0:00:01 --:--:-- 7194k
curl  -D - -o /dev/null  0.07s user 0.04s system 8% cpu 1.321 total
i.e. a near instant download
From my perspective, this seems to eliminate "lima", although UTM is using QEMU
So, I'm still left w/ QEMU + aarch64 + linux
I suppose I could try UTM w/ amd64 linux to see if that behaves differently...
f
Can you try to switch the networking layer in UTM
I can't remember what is the default, "Emulated VLAN"? Which is probably just QEMU slirp, same as Lima. Try setting e.g. "Bridged" or "Shared" and see if that makes a difference
We have support for this in Lima, but it is only used for ingress, not egress.
q
17 9374k 17 1658k 0 0 2738 0 05825 01020 04805 0
curl: (56) OpenSSL SSL_read: Connection reset by peer, errno 104
That's the end of the utm session 🙂
this is what i was using:
I don't think that's the default
f
Yeah, that's already what I wanted you to try, so I guess that's not it either 😞
q
the default (shared) doesn't seem to get me network connectivity which is why I switched it (I kinda needed the installer to install ubuntu...)
f
Maybe we do need some packet captures, not for the content, but just to see the traffic flow
q
alright. where do you want them captured? in lima?
on macOS?
f
I still somewhat suspect MTU and overflowing buffers because we go through several tunnel layers
q
i'm willing to buy that
f
I don't know; let's wait for Nino, who has more experience with this
q
i've run into MTU issues occasionally, but I've rarely been able to resolve them 🙂
f
Maybe it needs to be configured on the Gcloud side?
But you are not connecting to a VPC, so I don't know how that would even be possible
q
f
Normally the connection should negotiate to use the lower value of MTU on either side, but maybe this is not implemented correctly here?
And by "here" I mean the Gcloud side...
q
i do have unofficial access to a googler who is fairly network astute, i tossed him a bone, we'll see if he barks at me 🙂
f
Is
<http://appspot.com|appspot.com>
a Google service? Because that is the site creating the issue (I think)
q
yes appspot is google app engine
but, really, in general, most google ips are interchangeable
my naive understanding is that you say hello to a google ip and sni for the service you want and it routes you along
if you're using http3 it might reroute you w/ alt-svc, i suppose, although i haven't watched it do that much and know much less about alt-svc
f
yeah, I just wanted to make sure this is not some 3rd-party hosting service you use for your testing
q
oh, ... no appspot is functionally an implementation detail of GCR, https://cloud.google.com/container-registry/docs/access-control
The first image push to a hostname adds the registry host and its storage bucket to a project. For example, the first push to
<http://gcr.io/my-project|gcr.io/my-project>
adds the
<http://gcr.io|gcr.io>
registry host to the project with the project ID
my-project
and creates a storage bucket for the registry. The bucket name has one of the following formats:
<http://artifacts.PROJECT-ID.appspot.com|artifacts.PROJECT-ID.appspot.com>
for images stored on the host
<http://gcr.io|gcr.io>
<http://STORAGE-REGION.artifacts.PROJECT-ID.appspot.com|STORAGE-REGION.artifacts.PROJECT-ID.appspot.com>
for images stored on other registry hosts
f
I'm going to drop out now and hope by tomorrow that we have some response from your Googler, and also that Nino is back online
q
Thanks. I think now's a reasonable time for me to go home too
w
yeah usually I will use openssl to test for MTU related issues as well as TLS seems to start throwing handshake errors as well from fragmentation. Long thread, but did the ip link commands work to set it via rdctl shell for a session?
f
@wide-mechanic-33041 afaiu the command worked, but made no difference. I wonder if setting it to
1000
was too low; I read somewhere that the minimum value for Google Cloud is
1300
. So maybe the lower value is ignored in MSS negotiations.
w
yeah I mean too low probably would impact perceived speed and maybe some stability. I usually end up 1200-1250 with vms on my laptop assuming container, vm, vpn, and then some vxlan upstream. i would likley tcpdump the interface and see if there are a ton of retries or other TCP artifacts. Then work outward as best as possible
q
So I should try 1200?
c
@quick-keyboard-83126 Although I haven’t read all the messages I may have a few suggestions, are you able to use traceroute? If so, you can use the
--mtu
option to discover the MTU between your network and the destination, please note that
--mtu
is not always available on all traceroute utilities. Furthermore, I would also play around with the payload size and would capture that packet to see any signs of fragmentation of the packets or even you could look for
ICMP message-too-big
error messages or
ICMPv6 packet-too-big
errors (depending on your protocol)
Basically the idea here is when the packet is encapsulated beyond the network MTU, like if the packets go through
VXLAN
there will be an additional 50 bytes added to the overall size (+ IP header, etc), there usually will be some kind if
ICMP
messages sent back to the origin to indicated if the packets needs to be resized, etc.
w
yeah if you just do an “ip” in alpine it shoudl show the mtu of the interface
q
I'm naturally in Alpine. I'm happy to run whatever commands people suggest
The more precise the commands the happier I am
w
there are tons of layers with remote workers in the mix so dialing it in can take a few to settle. I found the biggest flag of fragmentation was TLS issues (handshake errors) but any socket could be impacted.
f
For the benefit of those who did not read the whole backlog (hah), Josh already set the MTU to 1000 yesterday and it made no difference: https://rancher-users.slack.com/archives/C0200L1N1MM/p1679526617272239?thread_ts=1679508822.589629&amp;cid=C0200L1N1MM
w
and you can always tcpdump eth0 to see if you are seeing other types of TCP issues
q
should i be in lima or rancher?
(i don't care, i just prefer precision, and the least amount of thinking on my part -- i'm trying to solve another production problem atm so this has my secondary focus)
w
in alpine (rdctl shell) “ip link set dev eth0 mtu 1250” which I think i had a config script on my windows machines
f
I would from now on just test in RD, as we have seen yesterday that the behaviour is the same everywhere (rd shell, lima alpine, lima ubuntu, utm alpine)
1
👍 1
q
Copy code
lima-rancher-desktop:/Users/jsoref$ ip link set dev eth0 mtu 1250
ip: ioctl 0x8922 failed: Operation not permitted
lima-rancher-desktop:/Users/jsoref$ sudo ip link set dev eth0 mtu 1250
lima-rancher-desktop:/Users/jsoref$
f
sudo
?
w
yeah was thinking the same thing?!
q
Yeah, i did, see the 3rd line 🙂
👍 1
so, what do you want for a traceroute?
w
did something else get installed in alpine that has you in as non-root?
q
afaiu,
rdctl shell
gives me non root by default
ok, what
sudo apk add
package gets me a
traceroute
you'll like?
w
you had mentioned an endpoint that was underperforming. Target that with the --mtu and see what shows up
q
busybox's traceroute isn't mtu friendly
w
heh always the case. maybe do a tcpdump on the interface and do some basic curl statements to see if you can spot retries or other TCP errors
f
sudo apk add traceroute
?
Copy code
lima-rancher-desktop:~# traceroute 2>&1 | grep mtu
  --mtu                       Discover MTU along the path being traced. Implies
q
Copy code
lima-rancher-desktop:/Users/jsoref$ traceroute 2>&1 | grep mtu
lima-rancher-desktop:/Users/jsoref$
Copy code
lima-rancher-desktop:/Users/jsoref$ traceroute 2>&1 | grep mtu
  --mtu                       Discover MTU along the path being traced. Implies
ok
Copy code
lima-rancher-desktop:/Users/jsoref$ traceroute --mtu artifacts.gcr-public-test-1
<http://2345.appspot.com|2345.appspot.com>
traceroute to <http://artifacts.gcr-public-test-12345.appspot.com|artifacts.gcr-public-test-12345.appspot.com> <tel:(142.251.32.84|(142.251.32.84>), 30 hops max, 65000 byte packets
 1  host.lima.internal (192.168.5.2)  0.678 ms F=1250  0.459 ms  0.362 ms
 2  * * *
 3  * * *
c
did it not finish?
is ICMP getting blocked?
q
Copy code
lima-rancher-desktop:/Users/jsoref$ traceroute --mtu artifacts.gcr-public-test-1
<http://2345.appspot.com|2345.appspot.com>
traceroute to <http://artifacts.gcr-public-test-12345.appspot.com|artifacts.gcr-public-test-12345.appspot.com> <tel:(142.251.32.84|(142.251.32.84>), 30 hops max, 65000 byte packets
 1  host.lima.internal (192.168.5.2)  0.678 ms F=1250  0.459 ms  0.362 ms
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * *
c
ha, ok something is blocking the ICMP
f
Yeah, traceroute doesn't seem to work inside the VM 😞
c
this might explain why the source fails to adjust the packets sizes which could explain why this whole thing might be failing 😐
q
i'll wait for someone to translate (we're fighting GKE in the foreground)
c
because the entire network fragmentation is driven based on the ICMP messages that is sent back from destination to source, I think
q
ok... so what do you folks want me to do? 🙂
w
so you confirmed that even w 1250 your performance to the destination is still poor?
q
yes.
Copy code
29  * * *
30  * * *
lima-rancher-desktop:/Users/jsoref$ curl --http1.1 -4 "$(~/gcloud/google-cloud-s
dk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-publi>
<http://c-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95|c-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa95>
784488c0bbc3e0abff2a6d853109e4a948| tail -n 1 |awk '{print $5}')" -D - -o /dev/n
ull
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0HTTP/1.1 200 OK
X-GUploader-UploadID: ADPycdsf8PjzNktWQt3o6jaexZkbRD_ofKSEYUSdLTmx4xyxu_TXz387MTzS7DLPOdk-sD9euV_K5StA51LY2ICDq8oKag
Expires: Thu, 23 Mar 2023 21:41:06 GMT
Date: Thu, 23 Mar 2023 20:41:06 GMT
Cache-Control: public, max-age=3600
Last-Modified: Mon, 06 Mar 2023 23:41:48 GMT
ETag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
Content-Type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
Accept-Ranges: bytes
Content-Length: 9599350
Server: UploadServer
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

  9 9374k    9  856k    0     0  45934      0  0:03:28  0:00:19  0:03:09 72842
(this transfer should finish nearly instantaneously)
(instead, it will effectively never finish)
w
apk add tcpdump
q
Copy code
lima-rancher-desktop:/Users/jsoref$ sudo apk add tcpdump
OK: 470 MiB in 174 packages
w
tcpdump -w gcp.pcap -i eth0
1
q
Copy code
lima-rancher-desktop:/Users/jsoref$ tcpdump -w gcp.pcap -i eth0
tcpdump: eth0: You don't have permission to capture on that device
(socket: Operation not permitted)
lima-rancher-desktop:/Users/jsoref$ sudo tcpdump -w gcp.pcap -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
w
and while that is running you will need to do your curl in a different session
1
q
ok, killing the curl and then ^c to the tcpdump...
w
once you have let curl go for a few seconds you can stop the dump witha ctl+c
q
Copy code
lima-rancher-desktop:/Users/jsoref$ sudo tcpdump -w gcp.pcap -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C498 packets captured
503 packets received by filter
0 packets dropped by kernel
w
copy out to your host using like the c$ mapping and use like wireshark to open it up. https://www.golinuxcloud.com/packet-fragmentation-wireshark/
q
i didn't need to copy it out... because rdctl aliases
~
into the shell
w
wireshark tends to highlight unexpected things
q
well, there are a couple of gray FIN,ACK, but, otherwise i don't see anything particularly exciting
w
yup looks pretty clean
q
i suppose i could capture a narrow host equivalent for the same traffic, but that'd require a bit more effort
w
and this is only w gcp destinations or all flows?
q
that tcp dump is the one you asked for
w
no i meant the slowness
q
but
docker pull
from docker hub doesn't have this behavior
👍 1
Support has asked me to run
gsutil perfdiag
, so i'm now running that...
👍 1
Copy code
lima-alpine:/Users/jsoref$ sudo apk add net‑tools
ERROR: unable to select packages:
  net‑tools (no such package):
    required by: world[net‑tools]
what am i doing wrong?
Fwiw, I'm going to restart my rancher ...
w
and be good to test those curl statements from other WSL2 distros to try and narrow the cause further
f
@wide-mechanic-33041 Josh is on macOS using M1
q
I'm on macOS
we tried ubuntu and alpine
both in rancher desktop and in lima w/o rancher desktop
w
hah sorry
q
and iirc i reproduced w/ UTM w/ ubuntu as well
f
Works on Intel:
Copy code
lima-rancher-desktop:~# apk add net-tools
(1/2) Installing mii-tool (2.10-r0)
(2/2) Installing net-tools (2.10-r0)
Executing busybox-1.35.0-r17.trigger
OK: 326 MiB in 109 packages
w
was helping others with WSL2 stuff just before this thread lit up. 😅
q
s'ok
w
any chance https://techmonitor.ai/technology/cloud/google-cloud-problems-server-latency-us-east could be related? I felt like I had seen something about their ingress the last couple days
q
this has been a problem for months
👍 1
f
I don't think it is related. Otherwise access from the host would be slow too.
w
yeah agreed. just quite an odd issue
f
I think the fact that ICMP is not working might be a hint. I just don't know if it is used for MSS negotiation or not. Reading Resolve IPv4 Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPsec - Cisco looks like it only uses TCP headers, so should work even when ICMP is blocked.
q
so, this is 1.8.1:
Copy code
lima-rancher-desktop:/tmp/google-cloud-sdk/bin$ curl "$(~/gcloud/google-cloud-sdk/bin/gsutil signurl -r us -d 20m $service_account_key <gs://artifacts.gcr-public-test-12345.appspot.com/containers/images/sha256:7f65636102fd1f499092cb075baa9>
5784488c0bbc3e0abff2a6d853109e4a948 | tail -n 1 |awk '{print $5}')" -D - -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0HTTP/2 200
x-guploader-uploadid: ADPycdvx73Fnw-tKT0XRFBXC0xG6g4trrvF8nlfTZ1pJHd8iFyjtD9Cjx37gxB9rSdCi1lJYfgs73Z7bp42Qoj2FrA4dF6AWg0bT
expires: Thu, 23 Mar 2023 22:27:20 GMT
date: Thu, 23 Mar 2023 21:27:20 GMT
cache-control: public, max-age=3600
last-modified: Mon, 06 Mar 2023 23:41:48 GMT
etag: "e453dccae0def69a0d0c2552647d58a7"
x-goog-generation: 1678146108297435
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 9599350
content-type: application/octet-stream
x-goog-hash: crc32c=gnP47A==
x-goog-hash: md5=5FPcyuDe9poNDCVSZH1Ypw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 9599350
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

  8 9374k    8  806k    0     0   4160      0  0:38:27  0:03:18  0:35:09   268
f
Same as before, right? I did not expect the RD version to make any difference here
q
right
i did that mostly because it reset my MTU and i didn't want to think about any more variables
now i get to try to figure out this fun
netstat
/
net-tools
thing
f
Not sure why it fails for you. Just tested on M1 and works too:
Copy code
$ rd shell sudo -i
lima-rancher-desktop:~# apk add net-tools
WARNING: Ignoring /media/vda/apks: No such file or directory
fetch <https://dl-cdn.alpinelinux.org/alpine/v3.16/main/aarch64/APKINDEX.tar.gz>
fetch <https://dl-cdn.alpinelinux.org/alpine/v3.16/community/aarch64/APKINDEX.tar.gz>
(1/2) Installing mii-tool (2.10-r0)
(2/2) Installing net-tools (2.10-r0)
Executing busybox-1.35.0-r17.trigger
OK: 286 MiB in 107 packages
lima-rancher-desktop:~#
q
it worked on rd1.8.1's, just not lima's
really puzzled
f
Worked for me on Lima too:
Copy code
$ l shell alpine sudo -i
lima-alpine:~# apk add net-tools
fetch <https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz>
fetch <https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz>
(1/2) Installing mii-tool (2.10-r0)
(2/2) Installing net-tools (2.10-r0)
Executing busybox-1.35.0-r29.trigger
OK: 43 MiB in 86 packages
lima-alpine:~#
q
ok
the problem was that some stupid web page gave me a fancy
en-dash
instead of a
-
😞 1
🤦‍♂️