https://rancher.com/ logo
Title
b

best-wall-17038

09/21/2022, 7:12 PM
Hi, I am using Tempo in my local k8s cluster and seems somehow tracing span getting weird and weird time by time.. I suspect it might be the issue cause clock skew.. I found something similar here but how should i solve this ? https://github.com/rancher-sandbox/rancher-desktop/issues/839
c

creamy-vr-37996

09/21/2022, 7:49 PM
Hey Semih, How about enable qemu-guest-agent on the machine where rancher desktop is running? I have seen a comment in the issue you linked
b

best-wall-17038

09/21/2022, 7:50 PM
I saw that as well but not sure how should i setup this on MacOS 😕
Is it something running behind the scenes always ?
c

creamy-vr-37996

09/21/2022, 7:51 PM
It is running on your macos laptop directly or on a VM on top of your laptop?
b

best-wall-17038

09/21/2022, 7:52 PM
how should i install and enable this ? Via docker or package manager
c

creamy-vr-37996

09/21/2022, 7:53 PM
If you’re running Rancher Desktop on a VM, then yes, you should install if that apply, if RD is installed directly on your macos laptop, then there is no need, it should be something else
b

best-wall-17038

09/21/2022, 7:53 PM
RD is running on my macos laptop
So will it work if I pull the image and run the container on my macos?
c

creamy-vr-37996

09/21/2022, 7:56 PM
It should, that would be a good test to see if that’s a Rancher Desktop issue or it is a system-wide issue
b

best-wall-17038

09/21/2022, 7:57 PM
which docker image should u use for this 😕
I am using macOS M1 btw 😕
c

creamy-vr-37996

09/21/2022, 7:57 PM
What image you want to use?, if you’re referring to qemu, it does not affect on macos, so it should be something else
b

best-wall-17038

09/21/2022, 7:58 PM
ok i am confused 😕
so to fix the the clock skew i need to enable qemu agent on my macos, right >
c

creamy-vr-37996

09/21/2022, 7:59 PM
No, qemu agent should only be enabled if you’re running RD on a VM, not on your macos laptop
b

best-wall-17038

09/21/2022, 8:00 PM
I seee. So what should I do in this case 😕
c

creamy-vr-37996

09/21/2022, 8:01 PM
How did you detected the time skew?, it is a second skew or longer times?
b

best-wall-17038

09/21/2022, 8:04 PM
If I reset cluster and redeploy everything at the beginning its working fine but later it breaks
c

creamy-vr-37996

09/21/2022, 8:06 PM
Hmm, that’s weird, it didn’t happen to me yet, can you confirm RD and also k8s version?
b

best-wall-17038

09/21/2022, 8:06 PM
Did u have any similar setup like this before in your local ?
Rancer 1.5.1
kubectl version                                                                                                                                                                                                                                                                             
Client Version: <http://version.Info|version.Info>{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"24", GitVersion:"v1.24.4+k3s1", GitCommit:"c3f830e9b9ed8a4d9d0e2aa663b4591b923a296e", GitTreeState:"clean", BuildDate:"2022-08-25T03:46:35Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/arm64"}
c

creamy-vr-37996

09/21/2022, 8:07 PM
I don’t have anything right now with so much proxies around
Can you go to the logs and provide some of them here?
b

best-wall-17038

09/21/2022, 8:08 PM
I ll try.. Which one do you need exactly ?
c

creamy-vr-37996

09/21/2022, 8:11 PM
k3s, <machineName>_serial, <machineName>.ha.stdout
Maybe there is something relevant there
b

best-wall-17038

09/21/2022, 8:16 PM
It will be getting worst during macos sleep btw
Probably when I wake up tomorrow and try sameting i could not even see anyspan
c

creamy-vr-37996

09/21/2022, 8:38 PM
Go to the rancher desktop dashboard preferences virtual machine, maybe you need more resources there to prevent skews
by default it uses 4gb and 2cpus, depending on how loaded is your cluster maybe it’s a lack of resources issue
b

best-wall-17038

09/21/2022, 8:39 PM
it is already 24gb and 8cpu
c

creamy-vr-37996

09/21/2022, 8:42 PM
It’s odd, I’ll look for it tomorrow, thanks Semih
🙌 1
f

fast-garage-66093

09/21/2022, 9:31 PM
If you are on an M1 mac, you need to use macOS 12.4+ to be able to use more than 3GiB. If you are on an older release, then Lima will automatically reduce the memory to 3GiB to avoid a macOS kernel bug with qemu 7+.
If you see more than a couple seconds time drift, please open a new bug for it. Lima is trying to sync the VM clock back to the host clock. It is using a somewhat crude mechanism, so may drift for a few seconds, but that should always be fixed within a minute or so. There may be larger drift directly after waking from a sleep state, but it should always catch up quickly
b

best-wall-17038

09/21/2022, 9:55 PM
@fast-garage-66093 its already 12.5.1 macOS Monterey
f

fast-garage-66093

09/21/2022, 9:57 PM
Ok, then it should be able to use the 24GB you have configured for it
b

best-wall-17038

09/21/2022, 9:58 PM
and even if I triggered new one, it shows 1hour ago
f

fast-garage-66093

09/21/2022, 10:00 PM
I can't tell where these times come from, can you run this command to see any time drift between host and VM:
$ TZ=UTC date;rdctl shell date
Wed 21 Sep 2022 22:00:21 UTC
Wed Sep 21 22:00:20 UTC 2022
b

best-wall-17038

09/21/2022, 10:01 PM
host is 00:01 now
f

fast-garage-66093

09/21/2022, 10:02 PM
Yes, but you are probably in a different timezone than UTC
b

best-wall-17038

09/21/2022, 10:02 PM
yeah UTC+2
f

fast-garage-66093

09/21/2022, 10:02 PM
Your screenshot shows that the VM and host have the same time within 1s
b

best-wall-17038

09/21/2022, 10:02 PM
Sorry GMT+2
f

fast-garage-66093

09/21/2022, 10:03 PM
So whatever you see in time drift from Prom has to be due to other issues
If I had to guess I would say it is using ICMP to measure something, and that doesn't work properly within qemu on macOS
b

best-wall-17038

09/21/2022, 10:04 PM
So what should I do in this case?
I am also pretty sure its not related with Prom
f

fast-garage-66093

09/21/2022, 10:07 PM
idk, I have no idea how Tempo is generating it's span metrics
I would go to the Grafana Slack and see if you can find some help there
b

best-wall-17038

09/21/2022, 10:07 PM
I asked already
f

fast-garage-66093

09/21/2022, 10:08 PM
Or at least an explanation on how the metrics are generated
b

best-wall-17038

09/21/2022, 10:08 PM
They told that it might be probably clock skew issue
I might maybe test this without RD to see how it goes
f

fast-garage-66093

09/21/2022, 10:09 PM
Yeah, but I've shown you that the clocks are pretty much in sync
It may very well be something about the VM, but without knowing what Tempo does, it is hard to figure out
I have never used Tempo, so I don't know how it works. It looks like it ingests traces, so the first step is to understand how these traces are generated, and if they have the correct timing information.
b

best-wall-17038

09/21/2022, 10:22 PM
let me ask
Tempo uses the timestamps provided in the spans it ingests. For example, these are the fields when using OTel proto
f

fast-garage-66093

09/22/2022, 4:07 PM
Yeah, but what/where is the call that collects the timestamps? If you can show a Linux command or small C/Go program that demonstrates that the VM or container clocks are significantly out of sync with the host clock, then please file a bug. But we won't have time to debug the tempo app or trace collector ourselves; we need a small repro case that demonstrates the issue directly.
Lima has Add hwclock sync loop to the guestagent by mikluko · Pull Request #490 · lima-vm/lima to keep the clocks in sync, but maybe it doesn't always work.
This is how I checked if the RTC and regular clock are in sync:
$ rdctl shell sudo -i sh -c 'date;hwclock'
Thu Sep 22 16:13:22 UTC 2022
Thu Sep 22 16:13:23 2022  0.000000 seconds
b

best-wall-17038

09/22/2022, 4:23 PM
rdctl shell sudo -i sh -c 'date;hwclock'                                                                                                                                                                                                                                                                                                                                       
Thu Sep 22 16:23:38 UTC 2022
Thu Sep 22 16:23:39 2022  0.000000 seconds
f

fast-garage-66093

09/22/2022, 4:24 PM
Yeah, they seem to match pretty closely
b

best-wall-17038

09/22/2022, 4:24 PM
seems so 😕
f

fast-garage-66093

09/22/2022, 4:24 PM
I've been trying to run
hwclock
inside a container, but neither alpine nor ubuntu seems to have an RTC device configured
Anyways, this is just poking at things in the dark; we need to know how the timestamps in the traces are retrieved to look into this further
b

best-wall-17038

09/22/2022, 4:28 PM
Dont know what should I do to be honest
f

fast-garage-66093

09/22/2022, 4:30 PM
You will have to go back to the team that produces the traces, and find out how they capture the timestamps. Since they use nano-second resolution, they aren't using the regular
time
, which have only full second granularity.
b

best-wall-17038

09/22/2022, 4:31 PM
I am the one that 🙂 but in the instrumentation there is nothin related with timestams
f

fast-garage-66093

09/22/2022, 4:32 PM
Once you know which API they use, you can create a small test program to query the time yourself, to see if you see any clock drift to the system or hardware clocks. If you do, then we can take a look. But if you can't, then I would claim it is a bug in Tempo...
There must be some code that fills in the start and end times in the trace packets; you need to find the place it is done, and how it is done.
Or you could start writing sample programs that use 64bit time interfaces and call them, to see if you see any problems there.