This message was deleted.
# general
a
This message was deleted.
d
what requests are being sent? mostly likely you have some problem machineConfigSchema or providerConfig
g
@dry-father-91460 Not sure what you mean by "what requests are being sent" but here is what I see in the rancher log when I click `Create`:
Copy code
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [planner] rkecluster fleet-default/chad-test-6: error during plan processing: no matching controller owner ref
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [rkecluster] fleet-default/chad-test-6: error getting CAPI cluster no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler rke-cluster: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler planner: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [planner] rkecluster fleet-default/chad-test-6: error during plan processing: no matching controller owner ref
2024/08/23 22:00:20 [ERROR] [planner] rkecluster fleet-default/chad-test-6: error during plan processing: no matching controller owner ref
2024/08/23 22:00:20 [ERROR] error syncing 'fleet-default/chad-test-6': handler planner: no matching controller owner ref, requeuing
2024/08/23 22:00:20 [ERROR] [planner] rkecluster fleet-default/chad-test-6: error during plan processing: no matching controller owner ref
2024/08/23 22:00:21 [ERROR] error syncing 'fleet-default/chad-test-6': handler planner: no matching controller owner ref, requeuing
2024/08/23 22:00:21 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for infrastructure ready
2024/08/23 22:00:21 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for infrastructure ready
2024/08/23 22:00:21 [ERROR] [machineprovision] fleet-default/chad-test-6-pool1-caddf3b5-9fnk4: error getting machine by owner reference: no matching controller owner ref
2024/08/23 22:00:21 [ERROR] error syncing 'fleet-default/chad-test-6-pool1-caddf3b5-9fnk4': handler machine-provision: no matching controller owner ref, requeuing
2024/08/23 22:00:21 [ERROR] [rkebootstrap] fleet-default/chad-test-6-bootstrap-template-c2zln: error getting machine by owner reference no matching controller owner ref
2024/08/23 22:00:21 [ERROR] error syncing 'fleet-default/chad-test-6-bootstrap-template-c2zln': handler rke-bootstrap: no matching controller owner ref, requeuing
2024/08/23 22:00:21 [ERROR] [machineprovision] fleet-default/chad-test-6-pool1-caddf3b5-9fnk4: error getting machine by owner reference: no matching controller owner ref
2024/08/23 22:00:21 [ERROR] [rkebootstrap] fleet-default/chad-test-6-bootstrap-template-c2zln: error getting machine by owner reference no matching controller owner ref
2024/08/23 22:00:22 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for at least one control plane, etcd, and worker node to be registered
2024/08/23 22:00:22 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:22 [INFO] EnsureSecretForServiceAccount: waiting for secret [chad-test-6-bootstrap-template-c2zln-machine-bootstrap-tokqh8n5] to be populated with token
2024/08/23 22:00:22 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:22 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:22 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:22 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:23 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:24 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
2024/08/23 22:00:29 [INFO] [planner] rkecluster fleet-default/chad-test-6: waiting for viable init node
and then I get this over and over:
Copy code
2024/08/23 22:00:32 [INFO] [machineprovision] fleet-default/chad-test-6-pool1-caddf3b5-9fnk4: reconciling machine job
2024/08/23 22:00:32 [ERROR] error syncing 'fleet-default/chad-test-6-pool1-caddf3b5-9fnk4': handler machine-provision: <http://nodedrivers.management.cattle.io|nodedrivers.management.cattle.io> "triton" not found, requeuing
2024/08/23 22:00:32 [INFO] [machineprovision] fleet-default/chad-test-6-pool1-caddf3b5-9fnk4: reconciling machine job
2024/08/23 22:00:32 [ERROR] error syncing 'fleet-default/chad-test-6-pool1-caddf3b5-9fnk4': handler machine-provision: <http://nodedrivers.management.cattle.io|nodedrivers.management.cattle.io> "triton" not found, requeuing
2024/08/23 22:00:32 [INFO] [machineprovision] fleet-default/chad-test-6-pool1-caddf3b5-9fnk4: reconciling machine job
2024/08/23 22:00:32 [ERROR] error syncing 'fleet-default/chad-test-6-pool1-caddf3b5-9fnk4': handler machine-provision: <http://nodedrivers.management.cattle.io|nodedrivers.management.cattle.io> "triton" not found, requeuing
2024/08/23 22:00:32 [INFO] [machineprovision] fleet-default/chad-test-6-pool1-caddf3b5-9fnk4: reconciling machine job
@dry-father-91460
mostly likely you have some problem machineConfigSchema or providerConfig
Where can I check this? The background is that I have a custom node driver named
triton
and then I have a Rancher extension named
triton
that implements
cloud-credential/triton.vue
and
machine-config/triton.vue
. Both the cloud credential and machine config UI show up correctly in the Rancher UI...
Does providerConfig apply to a custom node driver or just machineConfigSchema? I don’t have a cluster driver, just a node driver.
d
I meant requests in the network tab. I believe providerConfig will apply to a node driver as well
g
Ok I’ll post the requests shortly.
Where do I define providerConfig in my extension?
I see
/v3/schemas/tritoncredentialconfig
and
/v3/schemas/tritonconfig
in the Rancher API via the UI and they both look correct
But I'm not seeing one for machine or provider
When clicking
Create
I see two network requests: 1. POST
/v1/rke-machine-config.cattle.io.tritonconfigs/fleet-default
2. POST
/v1/provisioning.cattle.io.clusters
d
do you have a GET request to /v1/schemaDefinitions/rke-machine-config.cattle.io.tritonconfig as well?
g
Not after clicking
Create
, no. Should I check for that when loading the screen before filling it out?
I'm able to manually access
/v1/schemas/rke-machine-config.cattle.io.tritonconfig
...
But it's not called when loading the create cluster screen or when saving it. Does that indicate a problem with my extension?
d
I would expect it to be called when create screen is loading. If you have dashboard cloned locally, it might be easier to troubleshoot using it
g
I do have it cloned locally. Can you point me to where I would look to debug?
Do you have any guesses as to what’s wrong with my extension?
I’ve gone through https://extensions.rancher.io/extensions/introduction many times and looked at the open stack plug-in example but don’t see what’s wrong yet.
d
I'd suggest to yarn link your extension to the local dashboard and then check what is happening in edit/provisioning.cattle.io.cluster/rke2.vue
g
Ok, I’m new to vue and yarn but I’ll look that up. Thanks.
d
but don't use latest master at the moment. pick a branch that matches your rancher version
g
I’m on v2.8.5 so I’ll use that branch
Do you have any idea where the repeating errors in the log I posted are coming from?
I searched through all rancher repos and didn’t find much
Would the branch to checkout be
origin/release-2.8
?
@dry-father-91460 it looks like I may be experiencing https://github.com/rancher/rancher/issues/37074#issuecomment-1664722305 where the issue may be that Rancher is expecting the node driver
id
to be the same as the
name
but Rancher doesn't allow you to specify the
id
of custom node drivers. I'm going to try the workaround to set the
id
and see if that clears up the issue.
That got me past the
handler machine-provision: <http://nodedrivers.management.cattle.io|nodedrivers.management.cattle.io> "triton" not found, requeuing
error so it seems to be finding the driver now but now I'm onto a new error:
Copy code
2024/08/26 00:13:17 [INFO] [planner] rkecluster fleet-default/chad-test-1: waiting for at least one control plane, etcd, and worker node to be registered
2024/08/26 00:13:17 [ERROR] [rkebootstrap] fleet-default/chad-test-1-bootstrap-template-swwsw: error getting machine by owner reference no matching controller owner ref
d
The first info message just shows that not all nodes came up. the second one seems to explain why. it is outside of my domain and I would suggest posting this error as a separate question
👍 1
g
@dry-father-91460 Ok will do. One more quick question before I do that... I see that my node driver automatically got the
<http://lifecycle.cattle.io/create.node-driver-controller=true|lifecycle.cattle.io/create.node-driver-controller=true>
annotation on it. I only have a custom node driver, not a custom cluster driver so could that be related? I tried removing it via the v3 API but it doesn't let me change it for some reason.
d
I can't find docs explaining that annotation, but it looks like other examples of node drivers also have it, so i doubt it.
g
Ok
Here's something else I just found:
Copy code
6de70361494d:/var/lib/rancher # kubectl -n fleet-default get machines chad-test1-1-pool1-776db456dfxhqd5h-tzcrg
NAME                                        CLUSTER        NODENAME   PROVIDERID   PHASE      AGE   VERSION
chad-test1-1-pool1-776db456dfxhqd5h-tzcrg   chad-test1-1                           Deleting   61s
Copy code
6de70361494d:/var/lib/rancher # kubectl -n fleet-default get machines chad-test1-1-pool1-776db456dfxhqd5h-tzcrg -o yaml
apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
kind: Machine
metadata:
  annotations:
    <http://machine.cluster.x-k8s.io/exclude-node-draining|machine.cluster.x-k8s.io/exclude-node-draining>: "true"
  creationTimestamp: "2024-08-26T16:51:50Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-08-26T16:52:04Z"
  finalizers:
  - <http://machine.cluster.x-k8s.io|machine.cluster.x-k8s.io>
  generation: 3
  labels:
    <http://cattle.io/os|cattle.io/os>: linux
    <http://cluster.x-k8s.io/cluster-name|cluster.x-k8s.io/cluster-name>: chad-test1-1
    <http://cluster.x-k8s.io/control-plane|cluster.x-k8s.io/control-plane>: "true"
    <http://cluster.x-k8s.io/deployment-name|cluster.x-k8s.io/deployment-name>: chad-test1-1-pool1
    <http://cluster.x-k8s.io/set-name|cluster.x-k8s.io/set-name>: chad-test1-1-pool1-776db456dfxhqd5h
    machine-template-hash: 3328601289-qx8gq
    <http://rke.cattle.io/cluster-name|rke.cattle.io/cluster-name>: chad-test1-1
    <http://rke.cattle.io/control-plane-role|rke.cattle.io/control-plane-role>: "true"
    <http://rke.cattle.io/etcd-role|rke.cattle.io/etcd-role>: "true"
    <http://rke.cattle.io/rke-machine-pool-name|rke.cattle.io/rke-machine-pool-name>: pool1
    <http://rke.cattle.io/worker-role|rke.cattle.io/worker-role>: "true"
  name: chad-test1-1-pool1-776db456dfxhqd5h-tzcrg
  namespace: fleet-default
  ownerReferences:
  - apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: chad-test1-1-pool1-776db456dfxhqd5h
    uid: 1c4a3771-ddd0-413a-97e7-422ad54f1db1
  resourceVersion: "659213"
  uid: 93845d75-a31f-4eb4-a5ed-8696efc06c18
spec:
  bootstrap:
    configRef:
      apiVersion: <http://rke.cattle.io/v1|rke.cattle.io/v1>
      kind: RKEBootstrap
      name: chad-test1-1-bootstrap-template-7tzb8
      namespace: fleet-default
      uid: 1774b18e-0bc2-4f27-902a-972800222f10
    dataSecretName: chad-test1-1-bootstrap-template-7tzb8-machine-bootstrap
  clusterName: chad-test1-1
  infrastructureRef:
    apiVersion: <http://rke-machine.cattle.io/v1|rke-machine.cattle.io/v1>
    kind: TritonMachine
    name: chad-test1-1-pool1-021dfc9a-jhwsz
    namespace: fleet-default
    uid: 821870ba-b2dc-46b2-9abe-4092e2fd6fbe
  nodeDeletionTimeout: 10s
status:
  bootstrapReady: true
  conditions:
  - lastTransitionTime: "2024-08-26T16:52:00Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-08-26T16:51:50Z"
    status: "True"
    type: BootstrapReady
  - lastTransitionTime: "2024-08-26T16:52:04Z"
    message: deleting server [fleet-default/chad-test1-1-pool1-021dfc9a-jhwsz] of
      kind (TritonMachine) for machine chad-test1-1-pool1-776db456dfxhqd5h-tzcrg in
      infrastructure provider
    status: "False"
    type: InfrastructureReady
  - lastTransitionTime: "2024-08-26T16:52:04Z"
    reason: Deleting
    severity: Info
    status: "False"
    type: NodeHealthy
  - lastTransitionTime: "2024-08-26T16:52:04Z"
    status: "True"
    type: PreTerminateDeleteHookSucceeded
  failureMessage: |-
    Failure detected from referenced resource <http://rke-machine.cattle.io/v1|rke-machine.cattle.io/v1>, Kind=TritonMachine with name "chad-test1-1-pool1-021dfc9a-jhwsz": Downloading driver from <https://localhost/assets/docker-machine-driver-triton>
    Doing /etc/rancher/ssl
    ls: cannot access 'docker-machine-driver-*': No such file or directory
    downloaded file  failed sha256 checksum
    download of driver from <https://localhost/assets/docker-machine-driver-triton> failed
  failureReason: CreateError
  lastUpdated: "2024-08-26T16:52:04Z"
  observedGeneration: 3
  phase: Deleting
Last part seems problematic:
Copy code
failureMessage: |-
    Failure detected from referenced resource <http://rke-machine.cattle.io/v1|rke-machine.cattle.io/v1>, Kind=TritonMachine with name "chad-test1-1-pool1-021dfc9a-jhwsz": Downloading driver from <https://localhost/assets/docker-machine-driver-triton>
    Doing /etc/rancher/ssl
    ls: cannot access 'docker-machine-driver-*': No such file or directory
    downloaded file  failed sha256 checksum
    download of driver from <https://localhost/assets/docker-machine-driver-triton> failed
  failureReason: CreateError
  lastUpdated: "2024-08-26T16:52:04Z"
  observedGeneration: 3
  phase: Deleting
I did the workaround I linked before (https://github.com/rancher/rancher/issues/37074#issuecomment-1664722305) and then deleted the original node driver I cloned. Maybe I'll try changing the driver hash and then back again to see if forcing Rancher to download it again will work or not
Copy code
2024/08/26 17:04:37 [INFO] Copying management-state/machine-drivers/96df121f28c0069ca6c1dbc7beb14b8cf1163f9b660f06c332ac2f307ec6c732-docker-machine-driver-triton => /opt/drivers/management-state/bin/docker-machine-driver-triton-tmp
2024/08/26 17:04:37 [INFO] Copying management-state/machine-drivers/96df121f28c0069ca6c1dbc7beb14b8cf1163f9b660f06c332ac2f307ec6c732-docker-machine-driver-triton => /opt/drivers/management-state/bin/docker-machine-driver-triton-tmp
Seems to download it fine:
Copy code
6de70361494d:/var/lib/rancher # find / -name docker-machine-driver-triton -ls
  1966823  11656 -rwxr-xr-x   1 root     root     11932403 Aug 26 17:04 /opt/drivers/management-state/bin/docker-machine-driver-triton
  1966825  11656 -rwxr-xr-x   1 root     root     11932403 Aug 26 17:04 /usr/share/rancher/ui/assets/docker-machine-driver-triton
Copy code
6de70361494d:/usr/share/rancher/ui/assets # ls -lh docker-machine-driver-triton 
-rwxr-xr-x 1 root root 12M Aug 26 17:04 docker-machine-driver-triton
Maybe it's because I'm running Rancher locally using the skeloton app for extension development and so https://localhost/assets/docker-machine-driver-triton does not have a valid SSL certificate. When I
docker exec
into the rancher container
curl -O <https://localhost/assets/docker-machine-driver-triton>
fails but
curl -kO <https://localhost/assets/docker-machine-driver-triton>
works...
d
are you able to create a cluster using your driver without the ui?
g
I've never tried that. How can I do that?
I'm hoping there's something I can set to not validate the cert for when I'm running rancher locally?
Maybe setting the
CURL_INSECURE=true
environment variable on the rancher docker container?
d
https://www.suse.com/support/kb/doc/?id=000020121 but not sure if it will make it harder since you are already working with ui sorry, I am not a driver expert
g
That's ok, thanks. I'm going to try restarting the rancher docker container with that env var because it's a shell script using curl so it should work. I just found the code here: https://github.com/rancher/machine/blob/9183b3ff738e16ece4391a2e6bcc8ef88889e8ae/package/download_driver.sh#L15
@dry-father-91460 separate topic, but since you know dashboard well, do you know how I can set the icon for my custom node driver extension in the create cloud credentials and create cluster screens where it lists the providers? I see openstack.svg in the ui-plugin-examples repo but I don’t see where it’s configured…
b
Gm @gifted-breakfast-73755 👋 There’s a channel called #C04PPGWAWNR, which is better suited for this whole thread. I think you got a bit lucky that someone “found” your post here and replied. 😅 As for the
openstack.svg
icon there, if I am not mistaken, is the actual icon of extension card (although nothing is connected 😂 ) and not the node driver icon. The node driver icon part is undocumented, but it should be something like:
Copy code
plugin.register('image', 'providers/YOUR_PROVIDER_NAME.svg', require('./PATH_TO_YOUR_ICON/WHATEVER_ICON_NAME.svg'));
as per https://github.com/rancher/dashboard/pull/10312. I’ll update the docs for the node driver example 🙏
One thing though…. I think this will only work with
Rancher 2.9
because I think we’ve only added this feature to shell
2.0.1
which is only compliant with
Rancher 2.9
https://extensions.rancher.io/extensions/support-matrix
But you’ll need to give it a test in order to confirm this
g
Good morning @busy-ability-54059. Ah ok, sorry, I didn't know about the #C04PPGWAWNR channel so I will definitely join that channel and create future extension related posts in there, thank you.
As for the
openstack.svg
icon there, if I am not mistaken, is the actual icon of extension card
Yea, that matches with what I'm seeing as well. I placed a
pkg/triton/triton.svg
icon in my extension app and then added
metadata.icon = require('./triton.svg');
in
pkg/triton/index.ts
and that does show the icon on the Installed Extensions page but not on the create cloud credentials and create cluster pages like you said.
I'll try
plugin.register()
as you suggested and see if it requires 2.9
I'm still curious to know how the openstack node driver example sets the icon of the extension card though since I don't see any references to it...
@busy-ability-54059
plugin.register('image', 'providers/triton.svg', require('./triton.svg'));
worked on rancher v2.8.5 on the create cluster screen, but it did not work on the create cloud credential screen. Any idea how to get the same icon to show on that screen? Looking at the dashboard code it looks like it's looking for
~shell/assets/images/providers/${ id }.svg
(https://github.com/rancher/dashboard/blob/master/shell/edit/cloudcredential.vue#L158).
Maybe cloudcredential.vue doesn't support it since that does a direct require whereas index.vue does `plugin.getDynamic('image', ...)`(https://github.com/rancher/dashboard/blob/master/shell/edit/provisioning.cattle.io.cluster/index.vue#L415)
Seems like
cloudcredential.vue
should do the same call before falling back on require
b
Don’t forget to test this with a local build + developer load as it’s the mechanism that mimics exactly the import of a published extension rather than just trying it locally https://extensions.rancher.io/extensions/extensions-getting-started#building-the-extension + https://extensions.rancher.io/extensions/extensions-getting-started#test-built-extension-by-doing-a-developer-load . This response is in relation to your statement that it’s working on 2.8.5 the override of the cluster provisioning icon 🙏
g
@busy-ability-54059 I think it's only work on v2.8.5 since I'm running the skeloton app which uses version
^2.0.1
of
@rancher/shell
so you're probably right about it only work on v2.9
b
g
Thank you
🫡 1