This message was deleted.
# harvester
a
This message was deleted.
b
does Harvester have enough resources to allocate for the VM created by Rancher?
from what I understand from the logs, Rancher can’t setup the VM in Harvester
p
Yes it creates the VM and I can access it via ssh using a password that I manually set in the
HarvesterMachineTemplate
resource in the
sshPassword
field
after 10 minutes the machine gets deleted and it tries to create a new one, which obviously fails, and so on 🙂
In the harvester machine logs (which I see from the GUI), I see that on a working machine creation (new cluster) I see these lines:
Copy code
{"component":"virt-launcher","level":"info","msg":"Starting agent poller with commands: [guest-info]","pos":"agent_poller.go:334","timestamp":"2023-08-03T12:39:08.894228Z"}
{"component":"virt-launcher","level":"info","msg":"Starting agent poller with commands: [guest-network-get-interfaces guest-get-osinfo guest-get-timezone guest-get-host-name]","pos":"agent_poller.go:334","timestamp":"2023-08-03T12:39:08.894265Z"}
{"component":"virt-launcher","level":"info","msg":"Starting agent poller with commands: [guest-get-fsinfo]","pos":"agent_poller.go:334","timestamp":"2023-08-03T12:39:08.894290Z"}
{"component":"virt-launcher","level":"info","msg":"Starting agent poller with commands: [guest-get-users]","pos":"agent_poller.go:334","timestamp":"2023-08-03T12:39:08.894310Z"}
{"component":"virt-launcher","level":"info","msg":"Starting agent poller with commands: [guest-fsfreeze-status]","pos":"agent_poller.go:334","timestamp":"2023-08-03T12:39:08.894328Z"}
While in the logs of a machine that gets stuck I don’t see them, and the last line I see is:
Copy code
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:288","timestamp":"2023-08-03T12:37:28.313715Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: shared-ext-tds_shared-ext-tds-pool1-da40e9b0-gzx96","pos":"client.go:413","timestamp":"2023-08-03T12:37:28.315997Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:288","timestamp":"2023-08-03T12:37:28.318869Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: shared-ext-tds_shared-ext-tds-pool1-da40e9b0-gzx96","pos":"client.go:413","timestamp":"2023-08-03T12:37:28.320368Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"shared-ext-tds-pool1-da40e9b0-gzx96","namespace":"shared-ext-tds","pos":"server.go:190","timestamp":"2023-08-03T12:37:28.356111Z","uid":"4d794644-91d5-45a1-a574-fa7f9f97bb72"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"shared-ext-tds-pool1-da40e9b0-gzx96","namespace":"shared-ext-tds","pos":"server.go:190","timestamp":"2023-08-03T12:37:28.378283Z","uid":"4d794644-91d5-45a1-a574-fa7f9f97bb72"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"shared-ext-tds-pool1-da40e9b0-gzx96","namespace":"shared-ext-tds","pos":"server.go:190","timestamp":"2023-08-03T12:37:28.389796Z","uid":"4d794644-91d5-45a1-a574-fa7f9f97bb72"}
{"component":"virt-launcher","level":"info","msg":"Found PID for shared-ext-tds_shared-ext-tds-pool1-da40e9b0-gzx96: 86","pos":"monitor.go:141","timestamp":"2023-08-03T12:37:28.599482Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"shared-ext-tds-pool1-da40e9b0-gzx96","namespace":"shared-ext-tds","pos":"server.go:190","timestamp":"2023-08-03T12:38:40.786832Z","uid":"4d794644-91d5-45a1-a574-fa7f9f97bb72"}
The userdata is missing in the
HarvesterMachineTemplate
! I added it back with:
Copy code
#cloud-config
package_update: true
packages:
- qemu-guest-agent
runcmd:
- sh /usr/local/custom_script/install.sh
- - systemctl
  - enable
  - --now
  - qemu-guest-agent.service
This makes the machine at least to report the IP in Harvester and launch the script to bootstrap the node. However, instead of adding a new node to the cluster, it bootstraps a new cluster 🤔 :
Copy code
root@shared-ext-tds-pool1-da40e9b0-m5fjl:~# /var/lib/rancher/rke2/data/v1.24.9-rke2r2-154c18a3ccf5/bin/kubectl get node
NAME                                  STATUS   ROLES                       AGE   VERSION
shared-ext-tds-pool1-da40e9b0-m5fjl   Ready    control-plane,etcd,master   12m   v1.24.9+rke2r2
r
Could you help us show the content of the cluster CR?
Copy code
kubectl -n fleet-default get <http://clusters.provisioning.cattle.io|clusters.provisioning.cattle.io> <cluster-name> -o yaml
p
Copy code
apiVersion: <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>
kind: Cluster
metadata:
  annotations:
    <http://field.cattle.io/description|field.cattle.io/description>: TDS cluster for CSCS applications
  creationTimestamp: "2023-02-17T11:24:58Z"
  finalizers:
  - <http://wrangler.cattle.io/cloud-config-secret-remover|wrangler.cattle.io/cloud-config-secret-remover>
  - <http://wrangler.cattle.io/provisioning-cluster-remove|wrangler.cattle.io/provisioning-cluster-remove>
  - <http://wrangler.cattle.io/rke-cluster-remove|wrangler.cattle.io/rke-cluster-remove>
  generation: 17
  name: shared-ext-tds
  namespace: fleet-default
  resourceVersion: "413132426"
  uid: 0c87c767-a8e1-42c3-8265-53241710ad2e
spec:
  cloudCredentialSecretName: cattle-global-data:cc-jnjk4
  kubernetesVersion: v1.24.9+rke2r2
  localClusterAuthEndpoint:
    enabled: true
  rkeConfig:
    chartValues:
      harvester-cloud-provider:
        cloudConfigPath: /var/lib/rancher/rke2/etc/config-files/cloud-provider-config
        clusterName: shared-ext-tds
      rke2-cilium: {}
    etcd:
      s3:
        bucket: shared-ext-tds-backup
        cloudCredentialName: cattle-global-data:cc-687vs
        endpoint: <http://rgw.cscs.ch:443|rgw.cscs.ch:443>
        region: cscs-zonegroup
      snapshotRetention: 7
      snapshotScheduleCron: 0 5 * * *
    machineGlobalConfig:
      cni: cilium
      disable:
      - rke2-ingress-nginx
      disable-kube-proxy: false
      etcd-expose-metrics: false
    machinePools:
    - controlPlaneRole: true
      dynamicSchemaSpec: '{"resourceFields":{"cloudConfig":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"just
        keep it empty, this value will be filled by rancher-machine"},"clusterId":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        cluster id"},"clusterType":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        cluster type"},"cpuCount":{"type":"string","default":{"stringValue":"2","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"number
        of CPUs for machine"},"diskBus":{"type":"string","default":{"stringValue":"virtio","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"bus
        of disk for machine"},"diskSize":{"type":"string","default":{"stringValue":"40","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"size
        of disk for machine (in GiB)"},"imageName":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        image name"},"keyPairName":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        key pair name"},"kubeconfigContent":{"type":"password","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"contents
        of kubeconfig file for harvester cluster, base64 is supported"},"memorySize":{"type":"string","default":{"stringValue":"4","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"size
        of memory for machine (in GiB)"},"networkData":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"networkData
        content of cloud-init for machine, base64 is supported"},"networkModel":{"type":"string","default":{"stringValue":"virtio","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        network model"},"networkName":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        network name"},"networkType":{"type":"string","default":{"stringValue":"dhcp","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        network type"},"sshPassword":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"SSH
        password"},"sshPort":{"type":"string","default":{"stringValue":"22","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"SSH
        port"},"sshPrivateKeyPath":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"SSH
        private key path "},"sshUser":{"type":"string","default":{"stringValue":"root","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"SSH
        username"},"userData":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"userData
        content of cloud-init for machine, base64 is supported"},"vmAffinity":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        vm affinity, base64 is supported"},"vmNamespace":{"type":"string","default":{"stringValue":"default","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"harvester
        vm namespace"}}}'
      etcdRole: true
      machineConfigRef:
        kind: HarvesterConfig
        name: nc-shared-ext-tds-pool1-9tv2h
      name: pool1
      quantity: 3
      unhealthyNodeTimeout: 0s
      workerRole: true
    machineSelectorConfig:
    - config:
        cloud-provider-config: <secret://fleet-default>:harvesterconfigrmzc8
        cloud-provider-name: harvester
        protect-kernel-defaults: false
    registries: {}
    upgradeStrategy:
      controlPlaneConcurrency: "1"
      controlPlaneDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: false
        force: false
        gracePeriod: -1
        ignoreDaemonSets: true
        ignoreErrors: false
        postDrainHooks: null
        preDrainHooks: null
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 120
      workerConcurrency: "1"
      workerDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: false
        force: false
        gracePeriod: -1
        ignoreDaemonSets: true
        ignoreErrors: false
        postDrainHooks: null
        preDrainHooks: null
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 120
status:
  agentDeployed: true
  clientSecretName: shared-ext-tds-kubeconfig
  clusterName: c-m-nfgr6mjv
  conditions:
  - lastUpdateTime: "2023-02-17T11:24:59Z"
    status: "True"
    type: HarvesterCloudProviderConfigMigrated
  - lastUpdateTime: "2023-02-17T11:29:34Z"
    status: "False"
    type: Reconciling
  - lastUpdateTime: "2023-02-17T11:24:59Z"
    status: "False"
    type: Stalled
  - lastUpdateTime: "2023-08-07T01:14:30Z"
    status: "True"
    type: Created
  - lastUpdateTime: "2023-08-07T01:14:31Z"
    status: "True"
    type: RKECluster
  - lastUpdateTime: "2023-02-17T11:24:59Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2023-02-17T11:24:59Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2023-02-17T11:24:59Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2023-02-17T11:24:59Z"
    status: "True"
    type: InitialRolesPopulated
  - lastUpdateTime: "2023-02-17T11:25:01Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2023-02-17T11:26:24Z"
    status: "True"
    type: Provisioned
  - lastUpdateTime: "2023-02-17T11:25:02Z"
    status: "True"
    type: NoDiskPressure
  - lastUpdateTime: "2023-02-17T11:25:02Z"
    status: "True"
    type: NoMemoryPressure
  - lastUpdateTime: "2023-02-17T11:25:03Z"
    status: "True"
    type: SecretsMigrated
  - lastUpdateTime: "2023-02-17T11:25:03Z"
    status: "True"
    type: ServiceAccountSecretsMigrated
  - lastUpdateTime: "2023-08-07T01:14:29Z"
    status: "True"
    type: Connected
  - lastUpdateTime: "2023-08-07T01:14:31Z"
    message: non-ready bootstrap machine(s) shared-ext-tds-pool1-77d55f5877-qc58f
      and join url to be available on bootstrap node
    reason: Waiting
    status: Unknown
    type: Updated
  - lastUpdateTime: "2023-08-07T01:14:04Z"
    message: Cluster agent is not connected
    reason: Disconnected
    status: "False"
    type: Ready
  - lastUpdateTime: "2023-02-17T11:29:06Z"
    status: "True"
    type: GlobalAdminsSynced
  - lastUpdateTime: "2023-02-17T11:29:07Z"
    status: "True"
    type: SystemAccountCreated
  - lastUpdateTime: "2023-02-17T11:29:10Z"
    status: "True"
    type: AgentDeployed
  - lastUpdateTime: "2023-02-17T11:29:34Z"
    status: "True"
    type: Waiting
  - lastUpdateTime: "2023-04-17T12:54:00Z"
    status: "True"
    type: RKESecretsMigrated
  observedGeneration: 17
  ready: true
here it is