This message was deleted.
# general
a
This message was deleted.
s
have a look in the
cattle-fleet-system
namespace for failing fleet deployments and stateful sets
i think though it's generally advised to only run rancher in docker locally / dev world and the upgrade path is not supported.
a
Thanks, I've checked on managed cluster and can't see any error.
Copy code
#k get all -n cattle-fleet-system
NAME                               READY   STATUS    RESTARTS   AGE
pod/fleet-agent-6b6cbb454d-qjzq8   1/1     Running   0          10d

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/fleet-agent   1/1     1            1           133d

NAME                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/fleet-agent-6b6cbb454d   1         1         1       133d
replicaset.apps/fleet-agent-8689d9f67d   0         0         0       133d
#k get sts -n cattle-fleet-system
No resources found in cattle-fleet-system namespace.
#k logs pod/fleet-agent-6b6cbb454d-qjzq8 -n cattle-fleet-system | tail -3
time="2024-11-08T09:01:32Z" level=info msg="Deleting orphan bundle ID rke2, release kube-system/rke2-canal"
time="2024-11-08T09:14:57Z" level=info msg="getting history for release fleet-agent-eic-sb"
time="2024-11-08T09:14:57Z" level=info msg="getting history for release fleet-agent-eic-sb"
as we have so far only few cluster to manage, it was decided to have standalone docker installation for Rancher, but didn't know that upgrades are not supported for this type of setup
are the conditions for those fleet resources ok?
a
Yes, we follow this upgrade procedure, these are stilll non-production managed clusters. However not sure what do you mean by conditions for fleet resources, or how to check that.
I also have very similar setup as a sandbox where it looks OK, Rancher 2.9.2 + RKE2 1.29.9 with access to internet , compared to error reporting Rancher 2.9.2 + RKE2 1.28.10, not sure if that could be a problem. and indeed resource there looks different, with statefullset in cattle-fleet-system
Copy code
# k get all -n cattle-fleet-system
NAME                READY   STATUS    RESTARTS   AGE
pod/fleet-agent-0   2/2     Running   0          10d

NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/fleet-agent   ClusterIP   None         <none>        <none>    32d

NAME                           READY   AGE
statefulset.apps/fleet-agent   1/1     32d
s
conditions can be seen when
describe
ing the resource. though the overall state can be checked in the UI (state column). there could be some helm operations that are failing due to that connection issue as well, maybe that's stopping some of the fleet resources coming up?
a
IMHO those looks ok on managed cluster
Copy code
#k describe deployment.apps/fleet-agent -n cattle-fleet-system | sed '/Conditions:/,$!d'
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  fleet-agent-8689d9f67d (0/0 replicas created)
NewReplicaSet:   fleet-agent-6b6cbb454d (1/1 replicas created)
Events:          <none>
#k describe pod/fleet-agent-6b6cbb454d-qjzq8 -n cattle-fleet-system | sed '/Conditions/,/Volumes/!d'
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
Also deployment & pod state health looks fine in cattle-fleet-system
s
what are the conditions on the other cattle-fleet-system resources?
depending on whether the cluster is upstream (contains rancher) or downstream (all others) • upstream - checks stateful set (if available) and deployment ◦ stateful set - cattle-fleet-local-system/fleet-agent ◦ deployment - cattle-fleet-system/fleet-controller • downstream - checks stateful set ◦ stateful set - cattle-fleet-system/fleet-agent
for reference, this is the core block where fleet state is determined https://github.com/rancher/dashboard/blob/v2.9.2/shell/pages/c/_cluster/explorer/index.vue#L312-L340
a
rancher/upstream runs in docker deployment , where "Fleet" is marked as OK. In downstream clusters, there are is no other resource except deployment, pod & replica set in cattle-fleet-system
Copy code
#k get all -n cattle-fleet-system
NAME                               READY   STATUS    RESTARTS   AGE
pod/fleet-agent-6b6cbb454d-qjzq8   1/1     Running   0          13d

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/fleet-agent   1/1     1            1           136d

NAME                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/fleet-agent-6b6cbb454d   1         1         1       136d
replicaset.apps/fleet-agent-8689d9f67d   0         0         0       136d
so could it be that if Rancher is checking for statefull_set means that we have older version of fleet-agent ?
as these systems do not have access to internet
Copy code
(downstream) #k describe deployment.apps/fleet-agent -n cattle-fleet-system | grep Image
    Image:      rancher/fleet-agent:v0.7.1

(sandbox with internet access where it's OK) # k describe statefulset.apps/fleet-agent -n cattle-fleet-system | grep Image
    Image:      rancher/fleet-agent:v0.10.2
s
to confirm, where do you see the fleet warning box, in the upstream cluster containing rancher or a downstream cluster?
a
on downstream clusters, on all of them
s
thanks. you might be best taking to #C013SSBKB6U , though i'm not sure many people know or have looked at the ugprade path for docker
👍 1
w
0
a
Thanks a lot for your time ,expertise and advice.
👍 1
For the record, after enabling proxy on docker container where rancher is running, fleet-agent pods were upgraded and rancher check for statefulset resources is green again, except one cluster where Conditions now reporting for Init flee-agent-register container report "[ContainersNotInitialized] containers with incomplete status: [fleet-agent-register]" which I'm trying to check
142 Views