This message was deleted.
# harvester
a
This message was deleted.
w
you can download the version.yaml file and manually apply it then the button should show up
then 🤞 😉
s
Ah yes, thanks.
Copy code
curl <https://releases.rancher.com/harvester/v1.2.0/version.yaml> | kubectl apply --context=harvester-cluster -f -
Now I have everything crossed 🙂
Harvester upgrade from 1.1.2 to 1.2.0 has got stuck 50% through the "Upgrading System Service" phase, after downloading everything and preloading the images onto the three nodes. Looking at the advice on the upgrade notes page the hvst-upgrade apply-manifests job is spewing out this message every 5 seconds.
Copy code
$ kubectl --context harvester003 -n harvester-system logs hvst-upgrade-6hp8q-apply-manifests-9j9m6 --tail=10
instance-manager-r pod count is not 1 on node harvester001, will retry...
instance-manager-r pod count is not 1 on node harvester001, will retry...
instance-manager-r pod count is not 1 on node harvester001, will retry...
instance-manager-r pod count is not 1 on node harvester001, will retry...
And it's true - there are two
instance-manager-r
pods on that node - one 11 hours old running
longhorn-instance-manager:v1.4.3
and the other 12 days old running
longhorn-instance-manager:v1_20221003
. I suppose I could delete the old one - but would like a little bit of confidence that this would be the correct remedy.
s
Hi @sticky-summer-13450, Could you generate the support bundle for it? Generally, if you confirm the whole volumes are healthy (replicas should be more than 2), you could delete pdb directly. Or you can attach the support bundle, and I can double-check for you.
👍 1
s
There's nothing in Longhorn that is degraded or failed, but here's the support bundle:
👍 1
s
Thanks, let me check the SB…
👍 1
Hi @sticky-summer-13450, Could you help to open an issue for further analysis? Also, attach the above support bundle, thanks! I check for the old im-r. The replicas instance of this im-r are all deleted as below checking:
Copy code
$ kubectl get instancemanager instance-manager-r-1503169c -n longhorn-system -o yaml |yq -e ".status.instances" |grep name: > replica-list.txt
$ cat replica-list.txt |awk '{print $2}' |xargs -I {} kubectl get replicas {} -n longhorn-system
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-r-b0367b94" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-r-e861fab9" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-3f9a22e4-df30-45fc-b4c7-baed0c4ff217-r-894e8723" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-3f9a22e4-df30-45fc-b4c7-baed0c4ff217-r-29032c1b" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-3f9a22e4-df30-45fc-b4c7-baed0c4ff217-r-b6ada661" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-3f9a22e4-df30-45fc-b4c7-baed0c4ff217-r-cb42c033" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-3f9a22e4-df30-45fc-b4c7-baed0c4ff217-r-eb83b435" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-67e7a314-7384-469f-9268-bdcd8728e526-r-eaeeb01b" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-67e7a314-7384-469f-9268-bdcd8728e526-r-f4fcc792" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-160d8d70-01d1-4a13-abd5-11cff2be6071-r-a2afb620" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-160d8d70-01d1-4a13-abd5-11cff2be6071-r-e056af48" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-a5b5fe4c-eca4-4c97-a3db-f9490980c044-r-a8d11e24" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-a5b5fe4c-eca4-4c97-a3db-f9490980c044-r-d66c99b6" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-a5b5fe4c-eca4-4c97-a3db-f9490980c044-r-eab71d39" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-b5885e18-cc31-4ee1-8c91-afe881e09930-r-df556704" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-c4dfa684-2e3a-496f-9396-0e137a8f85e7-r-f1dd92d2" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-c108c3a1-bf5c-4d93-bb2b-99f1db4cc11c-r-1e8b6fa3" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-d01443ee-14fa-42c0-8721-b08935d5eaae-r-36c82c20" not found
Error from server (NotFound): <http://replicas.longhorn.io|replicas.longhorn.io> "pvc-d01443ee-14fa-42c0-8721-b08935d5eaae-r-79da0593" not found
But somehow, they all exist on the
instancemanager
. That’s why this im-r could not be deleted. I checked the whole attached volumes, and it looks like they are all healthy. So you could directly remove this im-r to make the upgrade continue.
👍 1
s
Reported in https://github.com/harvester/harvester/issues/4517 I'll go and remove that older
instance-manager-r
pod.
👍 1
s
Thanks, feel free to update here with your upgrade progress.
s
will do - I'm currently waiting for something to happen after deleting the pod.
I have deleted that instance-manager-r-1503169c pod:
Copy code
$ kubectl delete pod instance-manager-r-1503169c --context harvester003 -n longhorn-system
pod "instance-manager-r-1503169c" deleted
the hvst-upgrade apply-manifests job has moved on:
Copy code
2023-09-12T12:28:34+01:00 instance-manager-r pod count is not 1 on node harvester001, will retry...
2023-09-12T12:28:39+01:00 instance-manager-r pod count is not 1 on node harvester001, will retry...
2023-09-12T12:28:45+01:00 instance-manager-r pod image is not longhornio/longhorn-instance-manager:v1.4.3, will retry...
2023-09-12T12:28:50+01:00 Checking instance-manager-r pod on node harvester001 OK.
2023-09-12T12:28:50+01:00 Checking instance-manager-r pod on node harvester002...
2023-09-12T12:28:51+01:00 Checking instance-manager-r pod on node harvester002 OK.
2023-09-12T12:28:51+01:00 Checking instance-manager-r pod on node harvester003...
2023-09-12T12:28:51+01:00 Checking instance-manager-r pod on node harvester003 OK.
2023-09-12T12:28:51+01:00 Upgrading Managedchart rancher-monitoring-crd to 102.0.0+up40.1.2
2023-09-12T12:28:54+01:00 <http://managedchart.management.cattle.io/rancher-monitoring-crd|managedchart.management.cattle.io/rancher-monitoring-crd> patched
2023-09-12T12:28:55+01:00 <http://managedchart.management.cattle.io/rancher-monitoring-crd|managedchart.management.cattle.io/rancher-monitoring-crd> patched
2023-09-12T12:28:55+01:00 Waiting for ManagedChart fleet-local/rancher-monitoring-crd from generation 15
2023-09-12T12:28:55+01:00 Target version: 102.0.0+up40.1.2, Target state: ready
2023-09-12T12:28:56+01:00 Current version: 102.0.0+up40.1.2, Current state: OutOfSync, Current generation: 15
2023-09-12T12:29:01+01:00 Sleep for 5 seconds to retry
2023-09-12T12:29:02+01:00 Current version: 102.0.0+up40.1.2, Current state: WaitApplied, Current generation: 17
2023-09-12T12:29:07+01:00 Sleep for 5 seconds to retry
2023-09-12T12:29:08+01:00 Current version: 102.0.0+up40.1.2, Current state: WaitApplied, Current generation: 17
2023-09-12T12:29:13+01:00 Sleep for 5 seconds to retry
but appears to be stuck again.
s
cc @ancient-pizza-13099 could you help to check that?
p
@sticky-summer-13450 can you download helm and try to get the history of the
rancher-monitor-crd
chart? Thanks:
Copy code
helm history rancher-monitoring-crd -n cattle-monitoring-system
s
Hi @prehistoric-balloon-31801 - sure:
Copy code
$ helm history rancher-monitoring-crd --kube-context harvester003 -n cattle-monitoring-system
REVISION	UPDATED                 	STATUS         	CHART                                  	APP VERSION	DESCRIPTION      
1174    	Sun Jun 26 07:17:18 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1175    	Sun Jun 26 17:17:18 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1176    	Sun Jun 26 17:17:33 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1177    	Sun Jun 26 17:19:08 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1178    	Sun Jun 26 17:19:24 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1179    	Mon Jun 27 03:17:18 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1180    	Mon Jun 27 03:17:34 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1181    	Mon Jun 27 03:20:29 2022	superseded     	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1182    	Mon Jun 27 03:20:44 2022	deployed       	rancher-monitoring-crd-100.1.0+up19.0.3	           	Upgrade complete 
1183    	Mon Jun 27 04:03:08 2022	pending-upgrade	rancher-monitoring-crd-100.1.0+up19.0.3	           	Preparing upgrade
p
Thank you. Can you get a support bundle? I think it will continue if rollback the chart, but would like to check why fleet doesn’t upgrade the chart.
s
On the ticket Jain Wang is suggesting it could be because the cluster is given a LetsEncrypt TLS certificate without the IP as SAN. So I'll follow that suggestion (also in this ticket) and report the results - but I'll also start creating another Support Bundle.
👍 1
Latest SupportBundle (before I try to work-around the TLS issue).
🙌 2
p
Thanks Mark! cc @red-king-19196
👍 1
@sticky-summer-13450 I guess you delete the "Upgrade" resource right?
The one represents the v1.2.0 upgrade.
s
Yes I did - in my attempts to restart the upgrade. Sorry 😞
p
We are the ones to say sorry for the experience. Let us check, there should be a way to bypass the check. But please note it's near Friday EOB for me and Zespre 🙂
s
Thanks.
r
Hi Mark, sorry for the slow reply. We’d like to bring fleet-agent back on the right track first to see if that will relieve the whole situation and allow us to upgrade the cluster again. From the support bundle, it seems the communication between fleet-agent and the API server has some issues that causes multiple bundle deployments out of sync:
Copy code
W0922 09:01:43.872649       1 reflector.go:442] pkg/mod/github.com/rancher/client-go@v0.24.0-fleet1/tools/cache/reflector.go:167: watch of *v1alpha1.BundleDeployment ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 51; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
Since we don’t collect all of the secret objects on the users’ cluster with support-bundle-kit, could you help us check the content of the
fleet-agent
secret?
Copy code
kubectl -n cattle-fleet-local-system get secret fleet-agent -o jsonpath='{.data.kubeconfig}' | base64 -
There might be inconsistent in the CA and the URL. Thank you for being with us!
s
I assume you mean
... | base64 -d -
to decode the data rather than encode it twice. I think I can share that data - there is a token but I don't know how secure it needs to stay ...
Copy code
$ kubectl -n cattle-fleet-local-system --context=harvester003 get secret fleet-agent -o jsonpath='{.data.kubeconfig}' | base64 -d -
apiVersion: v1
clusters:
- cluster:
    server: <https://harvester-cluster.lan.lxiv.uk>
  name: cluster
contexts:
- context:
    cluster: cluster
    namespace: cluster-fleet-local-local-1a3d67d0a899
    user: user
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: user
  user:
    token: eyJhbGciOiJSUzI1NiIsImtpZCI6Im1ZeEFtYXppNnVjbzBoV3BxNFE0YmdKWHFQa1c4STVtVE5aeDdUNFplQ0UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJjbHVzdGVyLWZsZWV0LWxvY2FsLWxvY2FsLTFhM2Q2N2QwYTg5OSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJyZXF1ZXN0LXptNTI0LWIzZWFmY2Q5LTIxNWItNDU1Zi04YjQ3LTFlN2Q1ZDhmNzk0Ny10b2tlbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJyZXF1ZXN0LXptNTI0LWIzZWFmY2Q5LTIxNWItNDU1Zi04YjQ3LTFlN2Q1ZDhmNzk0NyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImJmZDlkZjI3LWUxYzctNDUyMC1hMTc1LWY4NzI1OTE1ZmZjZCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpjbHVzdGVyLWZsZWV0LWxvY2FsLWxvY2FsLTFhM2Q2N2QwYTg5OTpyZXF1ZXN0LXptNTI0LWIzZWFmY2Q5LTIxNWItNDU1Zi04YjQ3LTFlN2Q1ZDhmNzk0NyJ9.mwEmXPlhkZbn_g0XCVwBOjO34dw0MkWw0-fLtCanJWZlbPi-cQUdDoMP3kUTqNu6KYfew-kTgA2THyVtpTVtnnYWe1gXnz4GXCqXrCNT7qLHg7zJzV0y4-2eaiM_1hJ9XToLodIMsHq7tNObDvc12fLLm91fnf17KkCuTdfYEbq9DlQi3_h2BEFCZLfN2R5T2VjBqKMJAujqZGlTmLAIYuPe4ITCk5F8dGbWfJyIOySsns9iEd8URQtSz3x44aLL37YhyMDfq-9sDiVTiw0dcG9IF2OZBRdy4vnj7ipYJyTYLxB-m7F4J7y9gS5Xb_sn8_cUS21sQ7idHds_I8Mfkw
r
oops, sorry. yeah need to decode it.
hmm, so there’s no
certificate-authority-data
field under the
cluster
section, just
server
normally, it should be something like this:
Copy code
# kubectl -n cattle-fleet-local-system get secret fleet-agent -o jsonpath='{.data.kubeconfig}' | base64 -d
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJlVENDQVIrZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQWtNU0l3SUFZRFZRUUREQmx5YTJVeUxYTmwKY25abGNpMWpZVUF4TmpVMk5qQTVOalUyTUI0WERUSXNRFl6TURFM01qQTFObG9YRFRNeU1EWXlOekUzTWpBMQpObG93SkRFaU1DQUdBMVVFQXd3WmNtdGxNaTF6WlhKMlpYSXRZMkZBTVRZMU5qWXdPVFkxTmpCWk1CTUdCeXFHClNNNDlBZ0VHQ0NxR1NNNDlBd0VIQTBJQUJOeHEvdU9NMS8wTlJ2eTlFM0Y0eis3NXZaVXFSNXI3REFTb3hwTWIKYzR6STZuV3VoU2grZVNqTG85SzYzOHN1TE9tZDhhM2tvMTZUT0dOZWNueEJCZE9qUWpCQU1BNEdBMVVkRHdFQgovd1FFQXdJQ3BEQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01CMEdBMVVkRGdRV0JCU1Q1ckE1TXhxZnkwc21kaG1zCjZRbmN0d3RwQpBS0JnZ3Foa2pPUFFRREFnTklBREJGQWlBUTZqWUorQkFwMVh2RnRLQ0llVkVXaEc2akZiZmcKcUQ3U2J4UkQwd2tNVXdJaEFLV0RaVnZuKzF4d3IwRTliTnNqVERlVDdINEFYTGhlbHZxeCtmTTREeDl4Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    server: <https://10.53.0.1:443>
  name: cluster
<redacted>
But I remember you’re using Let’s Encrypt certificate, then the CA should be already there 🤔
s
I don't know if it makes any difference, but since I'm using a full Let's Encrypt certificate I have not needed to push the
CA
into
ssl-certificates
in harvesterhci.io/v1beta1 setting, I have only needed to push the
publicCertificate
(the
fullchain.pem
from LE) and the
privateKey
(the
privkey.pem
from LE). Is that a problem / assumption somewhere? My browser already trusts the certificate so I haven't needed to push a CA into Harvester. Do the rest of the components in Harvester also trust the certificate?
r
It should be okay not to provide the CA in the
ssl-certificates
setting since it’s a well-known CA. In fact, I’d suggest removing the FQDN from the
server-url
setting because all of these fleet-agent things are meant to be internal communications from Harvester’s point of view. User-designated domain names and certificates don’t have to mess with the internal communication in design. And we’re reviewing a fix that removes the updating of
server-url
so that it won’t change the value to the VIP address during the Harvester upgrade. In addition to that, please also change the
apiServerURL
to point to your internal IP address of the
rancher
Service in the
fleet-controller
ConfigMap as a workaround. I think, in your case, it’s
10.64.0.19
. Also, fill in
apiServerCA
with the value of
internal-cacerts
setting. And finally restart the
fleet-controller
deployment after applying the changes.
Copy code
$ kubectl -n cattle-fleet-system get cm fleet-controller -o yaml
apiVersion: v1
data:
  config: |
    {
      "systemDefaultRegistry": "",
      "agentImage": "rancher/fleet-agent:v0.7.0",
      "agentImagePullPolicy": "IfNotPresent",
      "apiServerURL": "<https://10.53.138.254>", # <-- please update this field with the value from rancher service's ip address
      "apiServerCA": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJ2VENDQVdPZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQkdNUnd3R2dZRFZRUUtFeE5rZVc1aGJXbGoKYkdsemRHVnVaWE
l0YjNKbk1TWXdKQVlEVlFRRERCMWtlVzVoYldsamJHbHpkR1Z1WlhJdFkyRkFNVFk1TlRZeQpNekUxTlRBZUZ3MHlNekE1TWpVd05qSTFOVFZhRncwek16QTVNakl3TmpJMU5UVmFNRVl4SERBYUJnTlZCQW9UCk
UyUjVibUZ0YVdOc2FYTjBaVzVsY2kxdmNtY3hKakFrQmdOVkJBTU1IV1I1Ym1GdGFXTnNhWE4wWlc1bGNpMWoKWVVBeE5qazFOakl6TVRVMU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRU1QVE
lFUDV6TjBISwpVWmtwbkNXN0xwN0JoOC9TRlEwbzU3UFFQNUdzQ0l1RlhhaENpekxKWHpKbkZhRi9qTmpTSEhXUmFkaGV5YXlBCks1TzlERTZVcUtOQ01FQXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRX
dFQi93UUZNQU1CQWY4d0hRWUQKVlIwT0JCWUVGQmY3cVpMNlhQcEFkUjJSaWd4OUNoOVh5ejJBTUFvR0NDcUdTTTQ5QkFNQ0EwZ0FNRVVDSURGWgp2MzdzaW9wUElwR2tBcjJ2MzR0bDB0Q0g3S0d0cjhEZkttUT
BTNWVnQWlFQTlxSHE4M0RNZlh3YzdObURWd3U3Cm80enJQNUJBUVU2MEpLVFFxR3piRVNVPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==", # <-- please update this field with the value from intenral-cacerts setting
      "agentCheckinInterval": "15m",
      "ignoreClusterRegistrationLabels": false,
s
Okay... I did:
Copy code
$ kubectl -n cattle-fleet-local-system --context=harvester003 get <http://setting.management.cattle.io|setting.management.cattle.io> internal-cacerts -o jsonpath='{.value}' | base64 -w0 -
to get the
internal-cacerts
(took a while to get that to work 😵‍💫) I updated the
fleet-controller
ConfigMap with the
apiServerURL
as
<https://10.64.0.19>
which is the HA Harvester API (this IP address is available on my LAN, not just internal to Harvester) and
apiServerCA
as the base64 got above. Then I deleted the current
fleet-controller-*
pod. The deployment created a new pod which logged some startup stuff ending with
msg="Cluster import for 'fleet-local/local'. Deployed new agent"
. Um - what now? 🙂
r
Hmmm, it should redeploy the fleet agent with the new URL and CA. What do you see in fleet agent’s log now?
s
Yes, the fleet agent was restarted and is repeatedly logging this, again.
Copy code
time="2023-09-26T07:14:31Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-local-system/fleet-agent-bootstrap: Post \"<https://10.64.0.19/apis/fleet.cattle.io/v1alpha1/namespaces/fleet-local/clusterregistrations>\": tls: failed to verify certificate: x509: cannot validate certificate for 10.64.0.19 because it doesn't contain any IP SANs"
r
It must be getting the wrong CA for verifying the internal endpoint… Do you see any
fleet-agent
and
fleet-agent-bootstrap
secrets in your cluster? We could check the content of them.
s
I was told to update fleet-agent-bootstrap in this thread: https://github.com/harvester/harvester/issues/4517#issuecomment-1729311789
r
The secret might be removed and re-created during fleet-controller restarts. So we might have to check the content again.
s
Ah yes - the
fleet-agent-bootstrap
secret is the same age as the
fleet-agent-*
and
fleet-controller-*
pods. The CA in
fleet-agent-bootstrap
secret is the same as the value from
$ kubectl -n cattle-fleet-local-system --context=harvester003 get <http://setting.management.cattle.io|setting.management.cattle.io> internal-cacerts -o jsonpath='{.value}'
There is no
fleet-agent
secret.
r
cool, that looks good. have you emptied the
server-url
setting already?
s
Ah - I'd missed that one, no I haven't. I should make that
''
now?
r
yep,
value: ""
s
done
r
and check if anything happen in fleet-agent’s log
s
still logging the same as before, every minute. It hasn't restarted though, so I guess the pod might need deleting?
r
yeah, we can give it a try
s
same error.
I have a thought - should the
fleet-controller
ConfigMap
apiServerURL
be the same as the
management
internal-server-url
value? Might that have the internal CA instead of the external certificate I've pushed in?
🙌 1
r
wait.. aren’t they the same value now? should be
<https://10.64.0.19>
s
internal-server-url
is
<https://10.53.15.158>
where as
apiServerURL
is the external IP address on my lan which is
<https://10.64.0.19>
I think I was clear above that the 10.64.0.19 is my LAN IP address, not an IP address internal to K8s in Harvester.
r
ah, my fault, i thought
10.64.0.19
is the internal VIP 🤯
let me think twice
so
10.64.0.19
is the management address (the IP address you filled during Harvester installation), right?
s
yes - the HA (High Availability) address
r
and
10.53.15.158
is the internal cluster-ip of
rancher
service object
my bad lol, i mix them up
s
I assume so!
🙂
r
thanks for the notice!
so we have to change those ip back to
10.53.15.158
and try again!
s
on it 🙂
👍 1
fleet-controller
ConfigMap, updated,
fleet-controller-*
pod deleted,
fleet-agent-*
pod logged this:
Copy code
I0926 08:05:39.425969       1 leaderelection.go:248] attempting to acquire leader lease cattle-fleet-local-system/fleet-agent-lock...
I0926 08:05:42.421333       1 leaderelection.go:258] successfully acquired lease cattle-fleet-local-system/fleet-agent-lock
time="2023-09-26T08:05:44Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
time="2023-09-26T08:05:44Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"
time="2023-09-26T08:05:44Z" level=info msg="Starting /v1, Kind=Node controller"
time="2023-09-26T08:05:44Z" level=info msg="Starting /v1, Kind=Secret controller"
E0926 08:05:44.512487       1 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>: 
time="2023-09-26T08:05:44Z" level=info msg="Starting <http://fleet.cattle.io/v1alpha1|fleet.cattle.io/v1alpha1>, Kind=BundleDeployment controller"
time="2023-09-26T08:05:44Z" level=info msg="getting history for release mcc-local-managed-system-upgrade-controller"
time="2023-09-26T08:05:44Z" level=info msg="getting history for release fleet-agent-local"
time="2023-09-26T08:05:44Z" level=info msg="getting history for release local-managed-system-agent"
W0926 08:05:44.797582       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0926 08:05:44.884917       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0926 08:05:46.942805       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0926 08:05:46.953756       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0926 08:05:47.017596       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0926 08:05:47.069599       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0926 08:05:47.109470       1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
time="2023-09-26T08:05:49Z" level=info msg="Deleting orphan bundle ID rke2, release kube-system/rke2-canal"
r
Could you check the following?
Copy code
kubectl -n fleet-local get bundles
The communication between the agent and rancher seems to be fixed
s
👍
Copy code
$ kubectl -n fleet-local --context=harvester003 get bundles
NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-agent-local                             1/1                       
local-managed-system-agent                    1/1                       
mcc-harvester                                 0/1                       NotReady(1) [Cluster fleet-local/local]; daemonset.apps kube-system/harvester-whereabouts [progressing] Updated: 2/3
mcc-harvester-crd                             1/1                       
mcc-local-managed-system-upgrade-controller   1/1                       
mcc-rancher-logging                           0/1                       OutOfSync(1) [Cluster fleet-local/local]
mcc-rancher-logging-crd                       0/1                       OutOfSync(1) [Cluster fleet-local/local]
mcc-rancher-monitoring                        0/1                       OutOfSync(1) [Cluster fleet-local/local]
mcc-rancher-monitoring-crd                    0/1                       WaitApplied(1) [Cluster fleet-local/local]
r
hmmm, maybe wait a bit then check again. It takes time for fleet to sync and reflect the changes.
👍 1
s
I have to start thinking about work for a short while too...
r
if the statuses are still the same, might need to check fleet-agent’s log again 👀
s
The
fleet-agent-*
pod logs are just ticking along logging this:
Copy code
time="2023-09-26T08:11:23Z" level=info msg="getting history for release local-managed-system-agent"
time="2023-09-26T08:16:25Z" level=info msg="getting history for release local-managed-system-agent"
time="2023-09-26T08:21:33Z" level=info msg="getting history for release local-managed-system-agent"
and so far the bundles have not changed.
r
Need to check the status of the charts. Do you have the
helm
command at your disposal?
Copy code
helm history local-managed-system-agent -n cattle-system
Maybe also generate a support bundle again since it’s been a while, and we changed the configuration.
s
Copy code
$ helm history local-managed-system-agent -n cattle-system --kube-context harvester003
REVISION	UPDATED                 	STATUS    	CHART                                                                                            	APP VERSION	DESCRIPTION     
4953    	Mon Sep 11 07:25:32 2023	superseded	local-managed-system-agent-v0.0.0+s-e6e150e25f6da0b545e400e61d9ee74f561acf20cb9ba33fcbdc3352724f1	           	Upgrade complete
4954    	Mon Sep 11 07:25:39 2023	superseded	local-managed-system-agent-v0.0.0+s-e6e150e25f6da0b545e400e61d9ee74f561acf20cb9ba33fcbdc3352724f1	           	Upgrade complete
4955    	Mon Sep 11 17:18:56 2023	superseded	local-managed-system-agent-v0.0.0+s-e6e150e25f6da0b545e400e61d9ee74f561acf20cb9ba33fcbdc3352724f1	           	Upgrade complete
4956    	Mon Sep 11 17:19:01 2023	superseded	local-managed-system-agent-v0.0.0+s-e6e150e25f6da0b545e400e61d9ee74f561acf20cb9ba33fcbdc3352724f1	           	Upgrade complete
4957    	Mon Sep 11 17:27:18 2023	superseded	local-managed-system-agent-v0.0.0+s-e6e150e25f6da0b545e400e61d9ee74f561acf20cb9ba33fcbdc3352724f1	           	Upgrade complete
4958    	Mon Sep 11 20:44:10 2023	superseded	local-managed-system-agent-v0.0.0+s-e6e150e25f6da0b545e400e61d9ee74f561acf20cb9ba33fcbdc3352724f1	           	Upgrade complete
4959    	Mon Sep 11 20:44:16 2023	superseded	local-managed-system-agent-v0.0.0+s-d3cb9a953dd679240b86c15757006baeaa3a5072a70879194e5abbb003513	           	Upgrade complete
4960    	Mon Sep 11 20:44:27 2023	superseded	local-managed-system-agent-v0.0.0+s-d3cb9a953dd679240b86c15757006baeaa3a5072a70879194e5abbb003513	           	Upgrade complete
4961    	Mon Sep 11 20:44:31 2023	superseded	local-managed-system-agent-v0.0.0+s-d3cb9a953dd679240b86c15757006baeaa3a5072a70879194e5abbb003513	           	Upgrade complete
4962    	Mon Sep 11 20:44:33 2023	deployed  	local-managed-system-agent-v0.0.0+s-d3cb9a953dd679240b86c15757006baeaa3a5072a70879194e5abbb003513	           	Upgrade complete
Support bundle is cooking...
r
from `cattle-system/rancher-576cf5cc45-4pv96`'s log:
Copy code
2023-09-26T13:05:13.972634730Z 2023/09/26 13:05:13 [ERROR] error syncing 'fleet-local/rancher-logging-crd': handler mcc-bundle: no chart version found for rancher-logging-crd-100.1.3+up3.17.7, requeuing
2023-09-26T13:05:19.226766849Z 2023/09/26 13:05:19 [ERROR] error syncing 'fleet-local/rancher-logging': handler mcc-bundle: no chart version found for rancher-logging-100.1.3+up3.17.7, requeuing
2023-09-26T13:05:21.936231014Z 2023/09/26 13:05:21 [ERROR] error syncing 'fleet-local/rancher-monitoring': handler mcc-bundle: no chart version found for rancher-monitoring-100.1.0+up19.0.3, requeuing
2023-09-26T13:06:01.406798090Z 2023/09/26 13:06:01 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2023-09-26T13:06:09.134873483Z 2023/09/26 13:06:09 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
rancher couldn’t find the charts for those bundles because the harvester cluster repo was already upgraded. there are only new versions of the charts. the cluster is in a middle state due to the previous unsuccessful upgrade. I have an idea: since
harvester-cluster-repo
pod is just an http server serving chart files, maybe we could temporarily swap the container image tag from
v1.2.0
to
v1.1.2
for rancher and fleet to complete their jobs. once the status of the bundles is sorted out, we could change the image back and kick-start the upgrade again.
👀 1
🤓 1
A quicker way is to skip the check imposed by the webhook and start a new upgrade directly. Detailed steps are described below: 1. Create a Version object
Copy code
cat <<EOF | kubectl apply -f -
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Version
metadata:
  name: v1.2.0
  namespace: harvester-system
spec:
1. Create a customized Upgrade object
Sorry, please ignore the above. A quicker way is to skip the check imposed by the webhook and start a new upgrade directly. Detailed steps are described below: Firstly, create a Version object
Copy code
cat <<EOF | kubectl apply -f -
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Version
metadata:
  name: v1.2.0
  namespace: harvester-system
spec:
  isoChecksum: '267d65117f6d9601383150b4e513065e673cccba86db9a8c6e7d3cb36a04f6202162f1b95c3c545a7389c4f20f82f5fff6c6e498ff74fcb61e8513992b83e1fb'
  isoURL: <https://releases.rancher.com/harvester/v1.2.0/harvester-v1.2.0-amd64.iso>
  releaseDate: '20230908'
EOF
Then create a customized Upgrade object
Copy code
cat <<EOF | kubectl apply -f -
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  annotations:
    <http://harvesterhci.io/skipWebhook|harvesterhci.io/skipWebhook>: "true"
  name: v1-2-0-skip-check
  namespace: harvester-system
spec:
  version: v1.2.0
EOF
🤓 1
s
Thank you @red-king-19196. I can't do anything during the day today (UK time) so I'll try this approach this evening or tomorrow morning.
🙌 1
Thanks @red-king-19196. I have applied those two objects this morning. The upgrade dialogue is now displaying but after waiting 20 minutes nothing seems to be happening. No new pods have started since the
fleet-agent-*
and
fleet-controler-*
47 hours ago.
r
Could you show us the Upgrade CR?
Copy code
kubectl -n harvester-system get upgrade v1-2-0-skip-check -o yaml
The upgrade log might have difficulty spinning up due to the error bundle. Need to confirm it.
s
Copy code
$ kubectl -n harvester-system --context=harvester003 get upgrade v1-2-0-skip-check -o yaml
Error from server (NotFound): <http://plans.upgrade.cattle.io|plans.upgrade.cattle.io> "v1-2-0-skip-check" not found
Copy code
$ kubectl --all-namespaces --context=harvester003 get upgrade
NAMESPACE       NAME                                                   IMAGE                                 CHANNEL   VERSION
cattle-system   hvst-upgrade-6hp8q-skip-restart-rancher-system-agent   <http://registry.suse.com/bci/bci-base:15.4|registry.suse.com/bci/bci-base:15.4>             23a54be8
cattle-system   sync-additional-ca                                     <http://registry.suse.com/bci/bci-base:15.4|registry.suse.com/bci/bci-base:15.4>             v1.1.0
cattle-system   system-agent-upgrader                                  rancher/system-agent                            v0.3.3-suc
cattle-system   system-agent-upgrader-windows                          rancher/wins                                    v0.4.11
r
My bad, I should’ve used the full name of the Upgrade resource. Please try it again with the following:
Copy code
kubectl -n harvester-system get <http://upgrades.harvesterhci.io|upgrades.harvesterhci.io> v1-2-0-skip-check -o yaml
👍 1
s
Copy code
$ kubectl -n harvester-system --context=harvester003 get <http://upgrades.harvesterhci.io|upgrades.harvesterhci.io> v1-2-0-skip-check -o yaml
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  annotations:
    <http://harvesterhci.io/skipWebhook|harvesterhci.io/skipWebhook>: "true"
    <http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
      {"apiVersion":"<http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>","kind":"Upgrade","metadata":{"annotations":{"<http://harvesterhci.io/skipWebhook|harvesterhci.io/skipWebhook>":"true"},"name":"v1-2-0-skip-check","namespace":"harvester-system"},"spec":{"version":"v1.2.0"}}
  creationTimestamp: "2023-09-28T07:21:09Z"
  finalizers:
  - <http://wrangler.cattle.io/harvester-upgrade-controller|wrangler.cattle.io/harvester-upgrade-controller>
  generation: 2
  labels:
    <http://harvesterhci.io/latestUpgrade|harvesterhci.io/latestUpgrade>: "true"
    <http://harvesterhci.io/upgradeState|harvesterhci.io/upgradeState>: PreparingLoggingInfra
  name: v1-2-0-skip-check
  namespace: harvester-system
  resourceVersion: "939012383"
  uid: ef543f21-01c4-4256-9be4-76589b878b4d
spec:
  image: ""
  logEnabled: true
  version: v1.2.0
status:
  conditions:
  - status: Unknown
    type: Completed
  - status: Unknown
    type: LogReady
  previousVersion: v1.2.0
  upgradeLog: v1-2-0-skip-check-upgradelog
r
Looks like the faulty rancher-logging bundle was causing the upgrade log feature not to work, so the entire upgrade progress was stuck at a very beginning point. We could start the upgrade again with the
logEnabled: false
to prevent this from happening:
Copy code
# remove the stuck upgrade resource
kubectl -n harvester-system delete upgrades v1-2-0-skip-check

# create the version resource if it's missing
cat <<EOF | kubectl apply -f -
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Version
metadata:
  name: v1.2.0
  namespace: harvester-system
spec:
  isoChecksum: '267d65117f6d9601383150b4e513065e673cccba86db9a8c6e7d3cb36a04f6202162f1b95c3c545a7389c4f20f82f5fff6c6e498ff74fcb61e8513992b83e1fb'
  isoURL: <https://releases.rancher.com/harvester/v1.2.0/harvester-v1.2.0-amd64.iso>
  releaseDate: '20230908'
EOF

# create the upgrade resource w/ the skip-webhook annotation and log-disable toggle
cat <<EOF | kubectl apply -f -
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  annotations:
    <http://harvesterhci.io/skipWebhook|harvesterhci.io/skipWebhook>: "true"
  name: v1-2-0-skip-check
  namespace: harvester-system
spec:
  logEnabled: false
  version: v1.2.0
EOF
And see if the upgrade proceeds.
👍 1
s
Yes - that really got things moving. Thank you. Currently stuck at the first pre-draining, but I'll have a dig through the usual issues page and see if anything matches.
🙌 1
All I needed to do was to reboot each node when it was at the
Pre-drained
state. I don't know why - I could not find any pod which was logging that it was waiting for something specific - although there was a Longhorn volume which was trying to attach in order to do a backup (which I started a week ago) but never managing to complete. I think that might have been me trying to backup my
gparted-live
machine, which is actually just a CD-ROM image. That never managed to attach so never managed to be backed-up. I don't really care - I can always remake it when I next need to check out a volume. Anyway - all has completed and I'm a very happy person. I hope that this has also helped make Harvester better for the future - which is really all that matters 🙂
👍 2
r
Glad to hear that! Just in case, could you take a look at the bundle status?
Copy code
kubectl -n fleet-local get bundles
See if there are any errors. We would like to ensure everything is fine since we applied many workarounds. Thank you for being so supportive! I’m sure we found many issues and sorted out the workarounds and solutions during the journey 🙌
s
Thanks @red-king-19196.
Copy code
$ kubectl -n fleet-local --context=harvester003 get bundles
NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-agent-local                             1/1                       
local-managed-system-agent                    1/1                       
mcc-harvester                                 1/1                       
mcc-harvester-crd                             1/1                       
mcc-local-managed-system-upgrade-controller   1/1                       
mcc-rancher-logging                           0/1                       OutOfSync(1) [Cluster fleet-local/local]
mcc-rancher-logging-crd                       1/1                       
mcc-rancher-monitoring                        0/1                       OutOfSync(1) [Cluster fleet-local/local]
mcc-rancher-monitoring-crd                    1/1
👀 1
r
Could you help generate a new support bundle to let us know the current status of the cluster? Thanks!
s
r
Due to the previous apply-manifest failure, the add-on conversions of
rancher-logging
and
rancher-monitoring
were incomplete. Later rounds of upgrades just skipped the conversions because the Harvester version was already v1.2.0. However, the functionality of both charts seems fine; they still run in the previous versions. cc @ancient-pizza-13099 Do you know if doing a manual conversion for the two charts is possible?
a
converting them manually is possible
(1) copy the existing yaml output of managedchart
rancher-monitoring
and
rancher-logging
(2) delete the those 2 managedcharts, wait until all PODs are removed (3) create addon, note to replace some fields of
rancher-monitoring
addon, e.g. VIP
s
Would the add-on being out of sync be causing extremely high load on the physical servers? My previously stable Harvester cluster has become very unstable - the stability appears to have regressed to the levels about 18 months ago. Nodes keep going off-line, causing the VMs to pause, and the volumes to become degraded in Longhorn, and the repair of the volumes causes extremely high load... and the cycle goes on.
a
seems in a bad loop did you try to stop all vm, and open Longhorn UI to check and rebuild volume/replicas
s
I stopped all VMs (taking my home offline) and waited for the load-averages to reduce to around 1. I started up only the VMs I really need ("prod" k3s cluster, home-automation, Nagios, & VPN termination) and mostly things are stable, I did have one period when VMs went into a 'paused' state, but they recovered.
The load-average on the Harvester nodes is much higher than with v1.1.2 and I cannot run the same number of VMs that I could with v1.1.2.
a
get top n process, and top n pod
and you may try to stop addons, like monitoring and logging
s
I have already stopped all addons from the GUI, there is nothing defined in Logging and Monitoring. I do note that
prometheus
is at the top of
top
most of the time on one node. I haven't done this yet because I haven't had time to get my head around exactly what I need to do.
An example of the top of
top
is:
Copy code
top - 12:56:03 up 2 days, 23:53,  1 user,  load average: 4.87, 3.61, 3.22
Tasks: 447 total,   1 running, 446 sleeping,   0 stopped,   0 zombie
%Cpu(s): 44.2 us,  5.7 sy,  0.0 ni, 48.9 id,  0.2 wa,  0.0 hi,  1.0 si,  0.0 st
MiB Mem : 63703.08+total, 26951.77+free, 20200.48+used, 17208.80+buff/cache
MiB Swap:    0.000 total,    0.000 free,    0.000 used. 43502.59+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                      
 8824 10001     20   0 1586468 664860  37160 S 107.9 1.019   1655:05 adapter                                                                                                                                                                                      
 6671 1000      20   0 6993456 2.180g 189692 S 101.7 3.505   1269:25 prometheus                                                                                                                                                                                   
 3803 root      20   0 2417388 846736  77560 S 100.3 1.298 726:05.37 rancher                                                                                                                                                                                      
 5066 root      20   0  877820 141512  55484 S 22.85 0.217 281:45.03 containerd                                                                                                                                                                                   
15088 107       20   0 13.054g 4.259g  22036 S 11.59 6.846 194:04.76 qemu-system-x86                                                                                                                                                                              
10765 root      20   0 4950184 2.477g  78820 S 8.278 3.982 914:47.21 kube-apiserver                                                                                                                                                                               
 2372 root      20   0 11.283g 440184 283816 S 7.616 0.675 474:24.04 etcd                                                                                                                                                                                         
 5430 root      20   0  906188 169212  65172 S 6.623 0.259 275:38.97 kubelet                                                                                                                                                                                      
 7661 root      20   0 1263936 417124  49976 S 5.298 0.639 264:06.55 harvester                                                                                                                                                                                    
 7816 root      20   0  968436 203888  41424 S 2.649 0.313  96:55.79 longhorn-manage                                                                                                                                                                              
 3544 root      20   0 1040556 280356  65708 S 1.656 0.430  93:23.73 kube-controller                                                                                                                                                                              
 9524 root      20   0 2271644  37524  13880 S 1.325 0.058  98:39.46 longhorn-instan                                                                                                                                                                              
    1 root      20   0  205468  17684   9632 S 0.993 0.027  46:17.48 systemd                                                                                                                                                                                      
10135 root      20   0 1903568  42744  13508 S 0.993 0.066  48:40.50 longhorn                                                                                                                                                                                     
14177 root      20   0 1829900  37756  13572 S 0.993 0.058   8:55.57 longhorn                                                                                                                                                                                     
16269 root      20   0 2181364  84836  46736 S 0.993 0.130  65:37.15 calico-node                                                                                                                                                                                  
 6897 root      20   0  723560  18196   9420 S 0.662 0.028   1:49.64 containerd-shim                                                                                                                                                                              
 7161 1001      20   0 1794104 209180  37784 S 0.662 0.321  23:58.47 virt-controller                                                                                                                                                                              
 9559 root      20   0  755312  60740  32916 S 0.662 0.093   3:33.47 harvester-netwo                                                                                                                                                                              
   21 root      20   0       0      0      0 S 0.331 0.000   1:03.77 ksoftirqd/1                                                                                                                                                                                  
   27 root      20   0       0      0      0 S 0.331 0.000  62:42.22 ksoftirqd/2                                                                                                                                                                                  
 2807 root      20   0  723560  16936   9328 S 0.331 0.026   1:39.28 containerd-shim                                                                                                                                                                              
 2865 root      20   0  723700  18396   9612 S 0.331 0.028   1:38.47 containerd-shim                                                                                                                                                                              
 3306 root      20   0  765904  76648  37272 S 0.331 0.118  12:31.75 kube-scheduler
An example of top pods is:
Copy code
$ kubectl top pods --sort-by='cpu' --context=harvester003 --all-namespaces
NAMESPACE                   NAME                                                     CPU(cores)   MEMORY(bytes)   
harvester-system            harvester-77c7bdd669-c8cxb                               727m         1411Mi          
kube-system                 kube-apiserver-harvester002                              515m         3914Mi          
kube-system                 kube-apiserver-harvester003                              332m         3619Mi          
cattle-monitoring-system    prometheus-rancher-monitoring-prometheus-0               235m         2063Mi          
default                     virt-launcher-kube002-zpg6s                              206m         7777Mi          
kube-system                 kube-apiserver-harvester001                              204m         1757Mi          
cattle-fleet-local-system   fleet-agent-75f5945649-8f6fp                             199m         450Mi           
default                     virt-launcher-nagioskube002-th6kz                        183m         2631Mi          
default                     virt-launcher-kube003-286s2                              139m         5331Mi          
kube-system                 etcd-harvester002                                        125m         567Mi           
default                     virt-launcher-kube004-kw769                              114m         4530Mi          
default                     virt-launcher-home-assistant-jwmw5                       112m         2579Mi          
longhorn-system             instance-manager-e-1041bf96596625fc7adf7838a77ad238      91m          234Mi           
kube-system                 etcd-harvester003                                        86m          582Mi           
kube-system                 etcd-harvester001                                        83m          522Mi           
harvester-system            harvester-77c7bdd669-jwfvl                               63m          522Mi           
harvester-system            harvester-77c7bdd669-fb2wd                               60m          592Mi           
kube-system                 rke2-canal-gl52z                                         52m          243Mi           
kube-system                 rke2-canal-v2vpf                                         52m          190Mi           
cattle-monitoring-system    rancher-monitoring-operator-559767d69b-lxkkp             39m          228Mi           
longhorn-system             longhorn-manager-8pkmn                                   39m          261Mi           
kube-system                 rke2-canal-gkkw4                                         38m          191Mi           
cattle-monitoring-system    rancher-monitoring-prometheus-adapter-8846d4757-bp2gj    37m          737Mi           
longhorn-system             instance-manager-e-1e123034ec30fd9c07f37ce7446d272b      37m          117Mi           
longhorn-system             instance-manager-e-65edf6e430281d7f0bb5498a3eac3469      33m          82Mi            
longhorn-system             longhorn-manager-cbshl                                   30m          236Mi           
default                     virt-launcher-wstunnel-wireguard-95pxq                   29m          1172Mi          
cattle-system               rancher-576cf5cc45-j6kfn                                 26m          1649Mi          
kube-system                 kube-controller-manager-harvester003                     26m          264Mi           
longhorn-system             instance-manager-r-65edf6e430281d7f0bb5498a3eac3469      26m          774Mi           
longhorn-system             instance-manager-r-1041bf96596625fc7adf7838a77ad238      23m          814Mi           
longhorn-system             longhorn-manager-7ft4z                                   22m          216Mi           
longhorn-system             instance-manager-r-1e123034ec30fd9c07f37ce7446d272b      21m          838Mi           
cattle-system               rancher-576cf5cc45-5vvmr                                 19m          1012Mi          
longhorn-system             engine-image-ei-1d169b76-z8tds                           18m          19Mi            
cattle-monitoring-system    rancher-monitoring-prometheus-node-exporter-7c7xf        17m          28Mi            
longhorn-system             engine-image-ei-1d169b76-mlkrd                           15m          24Mi            
longhorn-system             engine-image-ei-1d169b76-dczbc                           14m          20Mi            
cattle-monitoring-system    rancher-monitoring-prometheus-node-exporter-4sghm        11m          27Mi            
kube-system                 rke2-metrics-server-74f878b999-gknt2                     10m          40Mi            
harvester-system            harvester-webhook-7df6c7df75-44t7r                       9m           250Mi           
harvester-system            harvester-webhook-7df6c7df75-n92ht                       8m           182Mi           
longhorn-system             longhorn-recovery-backend-fb89c6ddd-6tvz6                8m           195Mi           
harvester-system            virt-api-6dc9cc7654-sxclk                                7m           257Mi           
harvester-system            virt-controller-7468cc6d9-vq4gc                          7m           133Mi           
longhorn-system             longhorn-admission-webhook-57cf4f4689-9gkpg              7m           300Mi           
harvester-system            virt-api-6dc9cc7654-wfdsf                                7m           198Mi           
kube-system                 kube-scheduler-harvester003                              6m           50Mi            
kube-system                 kube-scheduler-harvester002                              6m           53Mi            
kube-system                 rke2-ingress-nginx-controller-x579x                      5m           201Mi           
cattle-fleet-system         fleet-controller-56786984f4-tctds                        5m           157Mi           
kube-system                 kube-scheduler-harvester001                              5m           72Mi            
cattle-logging-system       rancher-logging-root-fluentbit-xkgq6                     5m           43Mi            
harvester-system            virt-operator-77c86586f6-m8sss                           5m           210Mi           
harvester-system            harvester-network-controller-manager-68fd49b88f-gkpz4    5m           49Mi            
longhorn-system             longhorn-admission-webhook-57cf4f4689-kp7fm              5m           256Mi           
harvester-system            harvester-load-balancer-6d89b964bb-ts8sp                 5m           55Mi            
kube-system                 rke2-ingress-nginx-controller-2dv6m                      5m           260Mi           
cattle-logging-system       rancher-logging-root-fluentbit-jmmlp                     5m           42Mi            
harvester-system            virt-controller-7468cc6d9-qw6nh                          5m           211Mi           
kube-system                 rke2-ingress-nginx-controller-2rvsr                      4m           233Mi           
kube-system                 kube-controller-manager-harvester001                     4m           32Mi            
kube-system                 kube-controller-manager-harvester002                     4m           32Mi            
harvester-system            kube-vip-mhld7                                           4m           24Mi            
cattle-logging-system       rancher-logging-574448c578-sx2l2                         4m           126Mi           
longhorn-system             longhorn-recovery-backend-fb89c6ddd-mdprq                4m           246Mi           
kube-system                 cloud-controller-manager-harvester002                    4m           32Mi            
harvester-system            harvester-webhook-7df6c7df75-kj66q                       3m           153Mi           
harvester-system            harvester-network-webhook-697c754ffb-dn8x6               3m           154Mi           
kube-system                 cloud-controller-manager-harvester003                    3m           22Mi            
harvester-system            virt-handler-x4m7x                                       3m           252Mi           
cattle-monitoring-system    rancher-monitoring-kube-state-metrics-5bc8bb48bd-df45p   3m           44Mi            
cattle-fleet-system         gitjob-845b9dcc47-jzkvt                                  3m           99Mi            
cattle-logging-system       rancher-logging-root-fluentd-0                           3m           319Mi           
longhorn-system             longhorn-loop-device-cleaner-7twf6                       3m           3Mi             
harvester-system            harvester-load-balancer-webhook-6dd77c56bf-k4fgn         3m           153Mi           
harvester-system            harvester-network-controller-8kgnm                       2m           67Mi            
longhorn-system             csi-provisioner-9674b9b-rc4tk                            2m           17Mi            
kube-system                 rke2-coredns-rke2-coredns-7f75564ff4-b4gmb               2m           37Mi            
longhorn-system             csi-provisioner-9674b9b-c8q8p                            2m           22Mi            
kube-system                 cloud-controller-manager-harvester001                    2m           25Mi            
harvester-system            harvester-network-controller-2fvvc                       2m           42Mi            
longhorn-system             longhorn-conversion-webhook-678ddcc967-kwrxg             2m           202Mi           
harvester-system            kube-vip-jc5cj                                           2m           19Mi            
longhorn-system             longhorn-conversion-webhook-678ddcc967-prg2x             2m           146Mi           
longhorn-system             backing-image-manager-36c7-45eb                          2m           24Mi            
kube-system                 rke2-coredns-rke2-coredns-7f75564ff4-b7hlj               2m           33Mi            
cattle-logging-system       rancher-logging-root-fluentbit-xjgwr                     2m           51Mi            
longhorn-system             csi-resizer-76f769988f-kdmlb                             2m           20Mi            
harvester-system            virt-handler-4hnzl                                       2m           225Mi           
cattle-system               system-upgrade-controller-5685d568ff-f76b8               2m           77Mi            
cattle-system               rancher-webhook-67bd6cf65d-6zd6s                         2m           171Mi           
harvester-system            virt-handler-4tzdl                                       2m           238Mi           
harvester-system            harvester-node-disk-manager-xnzwp                        1m           31Mi            
harvester-system            kube-vip-q5jbn                                           1m           19Mi            
kube-system                 rke2-coredns-rke2-coredns-autoscaler-84d67b7c48-g79nd    1m           18Mi            
kube-system                 kube-proxy-harvester002                                  1m           31Mi            
kube-system                 kube-proxy-harvester001                                  1m           28Mi            
kube-system                 harvester-whereabouts-bsc27                              1m           28Mi            
cattle-logging-system       harvester-default-event-tailer-0                         1m           14Mi
(limited by the amount I can post in a message)
a
without comparing to v112, it is difficult to say where the more resources are used 😂
s
> (1) copy the existing yaml output of managedchart
rancher-monitoring
and
rancher-logging
> (2) delete the those 2 managedcharts, wait until all PODs are removed > (3) create addon, note to replace some fields of
rancher-monitoring
addon, e.g. VIP Could you help me a bit more with this, please? I've copied the existing yaml for the ManagedCharts - here's one example.
Copy code
kubectl get managedchart rancher-monitoring -n fleet-local --context=harvester003 -o yaml > tmp/managedchart_rancher-monitoring.yaml
I've removed the managed chart - but I I don't know which Pods would be removed. For the rancher-monitoring I guessed that
prometheus-rancher-monitoring-prometheus-0
would go, but it hasn't. So my guess is wrong 😞
a
kubectl get managedchart -A
kubectl get <http://addons.harvesterhci.io|addons.harvesterhci.io> -A
, check if
rancher-monitoring
addons is enabled
if
rancher-monitoring
is not there, then
kubectl get deployment -n cattle-monitoring-system
, and get replicaset, then kubectl delete them
just wait
kubectl get <http://addons.harvesterhci.io|addons.harvesterhci.io> -A
, check if
rancher-monitoring
addons is enabled