adamant-kite-43734
09/15/2024, 3:20 PMworried-state-78253
09/15/2024, 3:31 PMworried-state-78253
09/15/2024, 3:38 PMtime="2024-09-15T15:28:42Z" level=debug msg="Expecting bundles from nodes: map[n0: n1: n2: n3: n4: n5:]"
time="2024-09-15T15:29:13Z" level=debug msg="Handle create node bundle for n2"
time="2024-09-15T15:29:13Z" level=debug msg="Complete node n2"
time="2024-09-15T15:29:14Z" level=debug msg="Handle create node bundle for n3"
time="2024-09-15T15:29:14Z" level=debug msg="Complete node n3"
time="2024-09-15T15:29:15Z" level=debug msg="Handle create node bundle for n1"
time="2024-09-15T15:29:15Z" level=debug msg="Complete node n1"
time="2024-09-15T15:29:17Z" level=debug msg="Handle create node bundle for n4"
time="2024-09-15T15:29:17Z" level=debug msg="Complete node n4"
time="2024-09-15T15:29:21Z" level=debug msg="Handle create node bundle for n5"
time="2024-09-15T15:29:21Z" level=debug msg="Complete node n5"
If we compare the logs from n0's supportbundle-agent-bundle-XXX vs the others -
n0 stops here -
+ curl -v -i -H 'Content-Type: application/zip' --data-binary @node_bundle.zip <http://10.52.10.15:8080/nodes/n0>
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.52.10.15:8080...
where as the others have gone further -
+ curl -v -i -H 'Content-Type: application/zip' --data-binary @node_bundle.zip <http://10.52.10.15:8080/nodes/n1>
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.52.10.15:8080...
* Connected to 10.52.10.15 (10.52.10.15) port 8080 (#0)
> POST /nodes/n1 HTTP/1.1
> Host: 10.52.10.15:8080
> User-Agent: curl/8.0.1
> Accept: */*
> Content-Type: application/zip
> Content-Length: 1372048
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
} [65536 bytes data]
* We are completely uploaded and fine
< HTTP/1.1 201 Created
< Date: Sun, 15 Sep 2024 15:29:15 GMT
< Content-Length: 0
< 
1339k    0     0  100 1339k      0   320M --:--:-- --:--:-- --:--:--  327M
* Connection #0 to host 10.52.10.15 left intact
HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Date: Sun, 15 Sep 2024 15:29:15 GMT
Content-Length: 0
+ sleep infinity
I’m going to remove n0 completely from the cluster and try again!worried-state-78253
09/15/2024, 3:44 PMworried-state-78253
09/16/2024, 9:22 AMn1:/ # kubectl get bundle -A
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 0/1                       Modified(1) [Cluster fleet-local/local]; <http://storageclass.storage.k8s.io|storageclass.storage.k8s.io> harvester-longhorn missing
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1
Not sure if above has any significance "Modified(1) [Cluster fleet-local/local]; storageclass.storage.k8s.io harvester-longhorn missing" ran on a control plane node.worried-state-78253
09/16/2024, 9:24 AMenough-australia-5601
09/16/2024, 11:00 AMlogs/kube-system/rke2-metrics-server-7f745dbddf-crckz/metrics-server.log:2024-09-12T13:11:05.801669145Z E0912 13:11:05.801609       1 scraper.go:149] "Failed to scrape node" err="Get \"<https://192.168.122.134:10250/metrics/resource>\": dial tcp 192.168.122.134:10250: connect: no route to host" node="n0"                                                                                                                          logs/kube-system/rke2-metrics-server-7f745dbddf-crckz/metrics-server.log:2024-09-12T13:11:20.777705961Z E0912 13:11:20.777630       1 scraper.go:149] "Failed to scrape node" err="Get \"<https://192.168.122.134:10250/metrics/resource>\": dial tcp 192.168.122.134:10250: connect: no route to host" node="n0"                                                                                                                          logs/kube-system/rke2-metrics-server-7f745dbddf-crckz/metrics-server.log:2024-09-12T13:11:35.817668323Z E0912 13:11:35.817610       1 scraper.go:149] "Failed to scrape node" err="Get \"<https://192.168.122.134:10250/metrics/resource>\": dial tcp 192.168.122.134:10250: connect: no route to host" node="n0"                                                                                                                          
[...]
logs/kube-system/rke2-metrics-server-7f745dbddf-crckz/metrics-server.log:2024-09-15T14:52:20.809677447Z E0915 14:52:20.809617       1 scraper.go:149] "Failed to scrape node" err="Get \"<https://192.168.122.134:10250/metrics/resource>\": dial tcp 192.168.122.134:10250: connect: no route to host" node="n0"                                                                                                                          logs/kube-system/rke2-metrics-server-7f745dbddf-crckz/metrics-server.log:2024-09-15T14:52:35.789668865Z E0915 14:52:35.789605       1 scraper.go:149] "Failed to scrape node" err="Get \"<https://192.168.122.134:10250/metrics/resource>\": dial tcp 192.168.122.134:10250: connect: no route to host" node="n0"                                                                                                                          logs/kube-system/rke2-metrics-server-7f745dbddf-crckz/metrics-server.log:2024-09-15T14:52:50.765663858Z E0915 14:52:50.765595       1 scraper.go:149] "Failed to scrape node" err="Get \"<https://192.168.122.134:10250/metrics/resource>\": dial tcp 192.168.122.134:10250: connect: no route to host" node="n0"
Seems like your node n0 has somehow disappeared at some point on September 12th?
Could you please describe what the underlying hardware of your setup is, and how history with n0 being a witness node went down?
I gathered from your earlier messages that you had a two-node setup with n0 as witness node. Did you then first try the upgrade or first add the other nodes?worried-state-78253
09/16/2024, 11:11 AMworried-state-78253
09/16/2024, 11:13 AMenough-australia-5601
09/16/2024, 11:15 AMworried-state-78253
09/16/2024, 11:18 AMenough-australia-5601
09/16/2024, 11:29 AMworried-state-78253
09/16/2024, 11:32 AMworried-state-78253
09/16/2024, 11:33 AMenough-australia-5601
09/16/2024, 11:41 AM<http://ManagedChart.management.cattle.io/v3|ManagedChart.management.cattle.io/v3>) named harvester in the fleet-local namespace is not ready:
status:
    conditions:
    - lastUpdateTime: "2024-09-05T01:42:17Z"
      message: Modified(1) [Cluster fleet-local/local]; <http://storageclass.storage.k8s.io|storageclass.storage.k8s.io>
        harvester-longhorn missing
      status: "False"
      type: Ready
    - lastUpdateTime: "2024-09-05T01:42:17Z"
      status: "True"
      type: Processed
    - lastUpdateTime: "2024-09-15T15:36:20Z"
      status: "True"
      type: Defined
Looks like it expects there to be a storage class named harvester-longhorn (which is there by default, I think), but in your setup that storage class has been removed.
Perhaps you can move the upgrade along by just adding such a storage class?worried-state-78253
09/16/2024, 11:52 AMworried-state-78253
09/16/2024, 4:16 PMworried-state-78253
09/16/2024, 4:16 PMworried-state-78253
09/16/2024, 7:36 PMenough-australia-5601
09/17/2024, 8:03 AMstatus:
    conditions:
    - lastUpdateTime: "2024-09-05T01:42:17Z"
      message: Modified(1) [Cluster fleet-local/local]; <http://storageclass.storage.k8s.io|storageclass.storage.k8s.io>
        harvester-longhorn missing
      status: "False"
      type: Ready
    - lastUpdateTime: "2024-09-05T01:42:17Z"
      status: "True"
      type: Processed
    - lastUpdateTime: "2024-09-16T19:31:20Z"
      status: "True"
      type: Defined
The cluster local in Rancher is the cluster that Rancher runs on. This isn't the same cluster as Harvester. You can add the storage class from the Harvester UI directly. If you access Harvester through Rancher, you can navigate to the Harvester UI by Virtualization Management -> Harvester Clusters -> select your Harvester cluster .worried-state-78253
09/17/2024, 8:50 AMallowVolumeExpansion: true
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
kind: StorageClass
metadata:
  annotations:
    <http://field.cattle.io/description|field.cattle.io/description>: origional default storage
  creationTimestamp: '2024-09-16T16:01:11Z'
  managedFields:
    - apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
      fieldsType: FieldsV1
      fieldsV1:
        f:allowVolumeExpansion: {}
        f:metadata:
          f:annotations:
            .: {}
            f:<http://field.cattle.io/description|field.cattle.io/description>: {}
        f:parameters:
          .: {}
          f:diskSelector: {}
          f:migratable: {}
          f:numberOfReplicas: {}
          f:staleReplicaTimeout: {}
        f:provisioner: {}
        f:reclaimPolicy: {}
        f:volumeBindingMode: {}
      manager: harvester
      operation: Update
      time: '2024-09-16T16:01:11Z'
  name: harvester-longhorn
  resourceVersion: '52841563'
  uid: 1c9f79e3-9a75-44d6-bfca-eba4b7ad9d9b
parameters:
  diskSelector: hdd
  migratable: 'true'
  numberOfReplicas: '3'
  staleReplicaTimeout: '30'
provisioner: <http://driver.longhorn.io|driver.longhorn.io>
reclaimPolicy: Delete
volumeBindingMode: Immediateworried-state-78253
09/17/2024, 8:51 AMworried-state-78253
09/17/2024, 8:53 AMglamorous-sunset-66832
09/17/2024, 12:13 PMworried-state-78253
09/17/2024, 1:00 PMworried-state-78253
09/17/2024, 1:01 PMworried-state-78253
09/25/2024, 9:57 AMworried-state-78253
09/25/2024, 9:58 AMworried-state-78253
09/25/2024, 9:59 AMworried-state-78253
09/25/2024, 10:05 AMworried-state-78253
09/25/2024, 11:05 AM