This message was deleted.
# general
a
This message was deleted.
b
This usually means there is an issue starting the
eks-operator
in the Rancher cluster. The CRD and the operator are created at the same time. There would be logs in Rancher indicating what the problem is.
f
will check... thanks for the response
leaderelection lost for cattle-controllers
Any clue about this error? this is from rancher pod logs
b
That error shouldn't be an issue. That would mean that a Rancher pod was the leader, lost the leader election, and needed to be restarted. The only issue would be if that error was happening continuously.
f
my all pods are in crashed state
2023/06/09 15:03:20 [INFO] Handling backend connection request [172.27.20.4]
2023/06/09 15:03:44 [ERROR] error syncing 'p-nw7q8/u-b4qkhsnliz-admin-cluster-owner': handler auth-prov-v2-rb: namespaces "p-nw7q8" not found, requeuing
2023/06/09 15:03:44 [ERROR] error syncing 'p-f68k4/u-b4qkhsnliz-admin-cluster-owner': handler auth-prov-v2-rb: namespaces "p-f68k4" not found, requeuing
2023/06/09 15:03:44 [ERROR] error syncing 'p-f68k4/local-fleet-local-owner-cluster-owner': handler auth-prov-v2-rb: namespaces "p-f68k4" not found, requeuing
2023/06/09 15:03:44 [ERROR] error syncing 'p-f68k4/cluster-owner': handler auth-prov-v2-role: namespaces "p-f68k4" not found, requeuing
2023/06/09 15:03:44 [ERROR] error syncing 'p-nw7q8/cluster-owner': handler auth-prov-v2-role: namespaces "p-nw7q8" not found, requeuing
2023/06/09 15:03:44 [ERROR] error syncing 'p-nw7q8/local-fleet-local-owner-cluster-owner': handler auth-prov-v2-rb: namespaces "p-nw7q8" not found, requeuing
2023/06/09 15:03:48 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-5rt7n': %!!(MISSING)w(<nil>), requeuing
these are logs from another pod
Actually I just deployed rancher deployment using image....I mean I am not running "helm install rancher/rancher-stable"
And looks like pod is expecting few namespaces and resources which were deployed with helm install <release>
b
Those namespaces are "project" namespaces. They aren't created when install Rancher via Helm.
f
okay.......but I am not getting errors ...what pod is lloking for like p-nw7q8/local-fleet-local-owner-cluster-owne etc
Does rancher has dependency on namespaces "cattle-fleet-clusters-system" "cattle-fleet-local-system" few more ?
Now I just have pods, svc. ingress in my namespace rancher
b
Yes, but Rancher should create the namespaces it needs when it starts up.
f
I believed those namespaces got created when I executed helm install <release? So I have deleted them now when installed rancher as deployment manifest
Did I messed the things?
You are right, I can see those namespaces again
b
I don't believe those namespaces get created when you install with Helm. I believe the Rancher pod creates them. You can restart the Rancher pod and they should come back.
f
yeah you are right.....
I see those namespaces again
my pods status is running now....don't know how but logs has many errors
BTW I have deployed my pods in custom namespace )Not in cattle-system) is that Ok? I mean I hope that's not mandatory to launch in cattle-system ns/
b
Last I investigated this, it was only mandatory for Rancher to be running the in the
cattle-system
namespace in one scenario. SUSE calls it the "hosted Rancher" setup. If you aren't running Rancher on a cluster that is managed by another Rancher instance, then I think you're fine.
f
okay....Its on EKS
2023/06/09 15:02:55 [ERROR] Failed to handle tunnel request from remote address x.x.x.x:37578: response 400: cluster not found
2023/06/09 15:03:00 [INFO] Handling backend connection request [172.27.16.140]
2023/06/09 15:03:00 [ERROR] Failed to handle tunnel request from remote address x.x.x.x:37590: response 400: cluster not found
2023/06/09 15:03:05 [ERROR] Failed to handle tunnel request from remote address x.x.x.x:42438: response 400: cluster not found
2023/06/09 15:03:10 [ERROR] Failed to handle tunnel request from remote address x.x.x.x:42440: response 400: cluster not found
2023/06/09 15:03:15 [ERROR] Failed to handle tunnel request from remote address x.x.x.x:41740: response 400: cluster not found
x.x.x.x is in my vpc range
any idea what are these ports?
Do I need to open some ports in cluster /worker SG?
b
I believe that means there is a cluster that used to be managed by this Rancher instance that it doesn't know about. There is a pod running in a cluster somewhere trying to connect to this Rancher server. Maybe one you previously imported? If so, then you can remove the
cluster-agent
on that old cluster and these logs should go away.
f
yes I did import one cluster but I had deleted cluster-agent pod few hours back
Updating TLS secret for cattle-system/serving-cert (count: 8): map[<http://field.cattle.io/projectId:local:p-nlgkz|field.cattle.io/projectId:local:p-nlgkz>
<http://listener.cattle.io/cn-127.0.0.1:127.0.0.1|listener.cattle.io/cn-127.0.0.1:127.0.0.1>
<http://listener.cattle.io/cn-x.x.x.x:x.x.x.x|listener.cattle.io/cn-x.x.x.x:x.x.x.x>
<http://listener.cattle.io/cn-x.x.x.x:x.x.x.x|listener.cattle.io/cn-x.x.x.x:x.x.x.x>
<http://listener.cattle.io/cn-x.x.x.x:x.x.x.x|listener.cattle.io/cn-x.x.x.x:x.x.x.x>
<http://listener.cattle.io/cn-localhost:localhost|listener.cattle.io/cn-localhost:localhost>
<http://listener.cattle.io/cn-rancher-poc.sw.abc.com:rancher-poc.sw.abc.com|listener.cattle.io/cn-rancher-poc.sw.abc.com:rancher-poc.sw.abc.com>
<http://listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system|listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system>
<http://listener.cattle.io/fingerprint:SHA1=dhdhdhdhdhdhdhdhdhdhdhdhd|listener.cattle.io/fingerprint:SHA1=dhdhdhdhdhdhdhdhdhdhdhdhd>]
this is another error.....
<http://rancher-poc.sw.abc.com:rancher-poc.sw.abc.com|rancher-poc.sw.abc.com:rancher-poc.sw.abc.com> isthe one I have deployed in another namespace
I am curious how come my curren pod/rancher server have logs related to another rancher server in another namespace?
though they are on same EKS cluster
b
You have two Rancher pods running in two different namespaces in the same cluster?
f
yes
b
Oh, that won't work.
f
oops
b
I don't think anything is actually broken. You should just only run one Rancher deployment. You can have multiple replicas in that deployment, but they have to know about and interact with each other. There is leader election, sharding, etc.
f
let me clean up old namespace and see
I am totally new to rancher....I am sorry for many questions and stupid ones
But you are too good man
Earlier I was able to create Cloud Credentials, but that option is deactivated now, any idea?
BTW Thank you for all the help so far.....I am thinking Rancher pod taking time to setup all required namespaces/other resources
b
Not sure why you aren't able to create the credential there. But on the left-hand side, you see the "Cloud Credential" tab. You can try there instead.
👍 1
f
Copy code
Cluster: cp-qa-ore Waiting
Namespace: fleet-default
Age: 4.8 mins
Wh i
Waiting for API to be available
Provisioner: Amazon EKS
There are 0 nodes available to run the cluster agent. The cluster will not become active until at least one node is available.
Machine Pools
Provisioning Log
Registration
Conditions
Recent Events
Related Resources
You should not import a cluster which has already been connected to another instance of Rancher as it will lead to data corruption.
Run the kubectl command below on an existing Kubernetes cluster running a supported Kubernetes version to import it into Rancher:
kubectl apply -f <https://rancher-poc.sw.abc.com/v3/import/brcrsvr54tk88d4qcfjgtmh6vpqq55lmq7ddpfn24zzczs7xvgjvnr_c-zqq45.yaml>
While importing a cluster, I am not getting why it is showing old server(rancher-poc.sw.abc.com) which I have launched in another namespace
I have already removed all resources corresponding to old server (rancher-poc)
b
Do you still have Rancher pods running in different namespaces?
f
no...I cleaned up all
and Now I launched rancher in cattle-system ns to avoid ambiguity
What if I update cluster-agent yaml and apply on imported cluster? Will it work?
Copy code
kind: Deployment
apiVersion: apps/v1
metadata:
  name: rancher
  namespace: cattle-system
  labels:
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: rancher
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: rancher
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rancher
  template:
    metadata:
      labels:
        app: rancher
        release: rancher
    spec:
      serviceAccountName: rancher
      containers:
      - image: rancher/rancher
        imagePullPolicy: IfNotPresent
        name: rancher
        ports:
        - containerPort: 80
          protocol: TCP
        - containerPort: 443
          protocol: TCP
        args:
        - "--http-listen-port=80"
        - "--https-listen-port=443"
        - "--add-local=true"
        env:
        - name: CATTLE_NAMESPACE
          value: cattle-system
        - name: CATTLE_PEER_SERVICE
          value: rancher
        - name: NO_PROXY
          value: "127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"
        - name: CATTLE_BOOTSTRAP_PASSWORD
          valueFrom:
            secretKeyRef:
              name: "bootstrap-secret"
              key: "bootstrapPassword"
Hey, If you get few mins, can you review my rancher deployment, I had removed many things from gitHub template
Is NodeAffinity compulsory? I see that in fleet-agent pod yaml
Sorry I am asking too much, I am in a tough spot at the moment
b
I’m not sure what is happening with your setup. It seems that having two Rancher deployments has caused some issues. I’m not sure how much help I would be with debugging the problems.
f
Fair enough....thank you for all the help
a
while I'm trying to import existing EKS cluster in rancher with cloud credentials it is showing below error. "There are 0 nodes available to run the cluster agent. The cluster will not become active until at least one node is available." Even i have nodegroup in EKS cluster
696 Views