https://rancher.com/ logo
#fleet
Title
# fleet
l

limited-potato-16824

10/12/2022, 3:00 PM
Hi, I have a k3s cluster that is managed by Rancher and the fleet-agent deployment to the k3s cluster is behaving very strange. It's almost as if I have two deployments of the fleet-agent competing with each other, I can see with
helm list
that
REVISION
is increased a couple of times and then the helm chart for fleet-agent is removed and it all repeats again. The cycle time of installing fleet-agent with helm and then bumping
REVISION
and uninstalling the helm chart for fleet-agent is ~10 seconds. It should be said that we have used this cluster for upgrade-tests earlier that might have caused this to happen, but if I can get some hints where I might find the cause for this I will be very grateful since I have spent too much time going round in circles. Note: 1. The fleet-agent pod is not restarted, but the fleet-agent bundle is jumping between states "Pending" and "Wait applied". 2. We are managing 12 clusters with this Rancher installation and this k3s cluster is the only one fighting the fleet-agent bundle.
Copy code
fleet-agent time="2022-10-12T14:38:44Z" level=error msg="error syncing 'cluster-fleet-default-k3s-rnd-11e91fbf4d78/fleet-agent-k3s-rnd': handler bundle-deploy: Operation cannot be fulfilled on <http://bundledeployments.fleet.cattle.io|bundledeployments.fleet.cattle.io> \"fleet-agent-k3s-rnd\": StorageError: invalid object, Code: 4, Key: /registry/fleet.cattle.io/bundledeployments/cluster-fleet-default-k3s-rnd-11e91fbf4d78/fleet-agent-k3s-rnd, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: a70272d3-e2fb-4864-aa9f-a7565c7570b4, UID in object meta: , handler bundle-monitor: Operation cannot be fulfilled on <http://bundledeployments.fleet.cattle.io|bundledeployments.fleet.cattle.io> \"fleet-agent-k3s-rnd\": StorageError: invalid object, Code: 4, Key: /registry/fleet.cattle.io/bundledeployments/cluster-fleet-default-k3s-rnd-11e91fbf4d78/fleet-agent-k3s-rnd, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: a70272d3-e2fb-4864-aa9f-a7565c7570b4, UID in object meta: , requeuing"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="Helm: Deleting release fleet-agent-k3s-rnd 1"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="Helm: Installing fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:46Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:48Z" level=info msg="Helm: Deleting release fleet-agent-k3s-rnd 1"
fleet-agent time="2022-10-12T14:38:48Z" level=info msg="Helm: Installing fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:48Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:48Z" level=info msg="getting history for release fleet-agent-k3s-rnd"
fleet-agent time="2022-10-12T14:38:48Z" level=error msg="error syncing 'cluster-fleet-default-k3s-rnd-11e91fbf4d78/fleet-agent-k3s-rnd': handler bundle-deploy: Operation cannot be fulfilled on <http://bundledeployments.fleet.cattle.io|bundledeployments.fleet.cattle.io> \"fleet-agent-k3s-rnd\": StorageError: invalid object, Code: 4, Key: /registry/fleet.cattle.io/bundledeployments/cluster-fleet-default-k3s-rnd-11e91fbf4d78/fleet-agent-k3s-rnd, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: aae9c7b4-452f-418a-a5bc-4a43b6f2fa66, UID in object meta: , handler bundle-monitor: Operation cannot be fulfilled on <http://bundledeployments.fleet.cattle.io|bundledeployments.fleet.cattle.io> \"fleet-agent-k3s-rnd\": StorageError: invalid object, Code: 4, Key: /registry/fleet.cattle.io/bundledeployments/cluster-fleet-default-k3s-rnd-11e91fbf4d78/fleet-agent-k3s-rnd, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: aae9c7b4-452f-418a-a5bc-4a43b6f2fa66, UID in object meta: , requeuing"
I should also add that I have removed the k3s cluster from Rancher and re-added it two times but the issue is still there.
q

quick-sandwich-76600

10/14/2022, 6:46 AM
Hi @limited-potato-16824, may please you remove the k3s cluster from Rancher and then try to clean any Rancher resource that may remain on it using this process: https://github.com/rancher/rancher-cleanup .Once cleaned, try to register it again and let me know if it made any difference.
l

limited-potato-16824

10/14/2022, 8:52 AM
Thanks for replying @quick-sandwich-76600, I did run the rancher-cleanup job and re-joined the cluster to Rancher but I still have the same issue. I have also tried: • join the cluster with another name • Deleted the table k3s.kine in mysql and started up the cluster on a single node. I still have the same issue, so I think this is Rancher related and not something in the k3s cluster
One step closer to the truth, I can now see this error message in Rancher in the fleet-agent bundle deployment: "ErrApplied(1) [Cluster fleet-default/k3s-rnd: cannot re-use a name that is still in use]"
Also deleted the /var/lib/rancher directory and re-joined the cluster but no luck
How is the fleet-agent deployment generated in Rancher?
q

quick-sandwich-76600

10/14/2022, 1:51 PM
I don't know how the process works. In any case, the last error you mentioned can help. It may happen that it's somehow stuck and not being shown on the UI, but the object is there.
l

limited-potato-16824

10/14/2022, 1:54 PM
Yeah, I think so too and I have been digging for hours now but I can't find it. It's hiding 😄
q

quick-sandwich-76600

10/14/2022, 1:57 PM
May you please run this on your local cluster https://gist.github.com/juanbrny/6fd755c5019745678a73933ed1c1638c? . Make sure that FASTMODE=0 and ONLY_SYSTEM_NAMESPACES=1 (just the opposite of the default values that you can see in the script). As that can dump confidential data, please share it privately with me or do your own debugging looking for references to "k3s-rnd" in the results.
l

limited-potato-16824

10/14/2022, 2:04 PM
I just did a sanity check and added the k3s cluster to out rancher dev environment and everything works fine there
I'll take a look at that script. Thank you
q

quick-sandwich-76600

10/14/2022, 2:05 PM
Yes, that confirms that a not properly deleted object should exist somewhere ...
A silly question, is this the K3s cluster where you changed INT to BIGINT some weeks ago or is a different one?
l

limited-potato-16824

10/14/2022, 2:06 PM
No, this is a separate k3s cluster
still holding my breath that the one we did the changes in will behave 😄
🤞 1
I ended up installing a new Rancher for this site and the cluster connects successfully to that Rancher installation. I will continue to investigate why this fails in the older Rancher installation we have with a new cluster.
q

quick-sandwich-76600

10/20/2022, 9:46 AM
There's clearly some data there that need to be removed. You mentioned the other day that you removed most of the results but not the bundle. Can you go to Continuous Delivery / Bundles and delete the bundle with the cluster ID (or delete it with the cli)?
175 Views