brave-rose-20023
10/07/2023, 12:00 PMadamant-kite-43734
10/18/2023, 9:52 AMnutritious-television-77271
10/25/2023, 3:40 PMswift-sunset-4572
10/27/2023, 4:38 PMbitter-shoe-85930
10/31/2023, 12:55 PMadventurous-address-26812
11/10/2023, 3:47 PMadventurous-address-26812
11/10/2023, 3:55 PMadventurous-address-26812
11/10/2023, 6:25 PMflat-table-34962
11/16/2023, 3:52 AMadamant-kite-43734
11/16/2023, 3:53 AMadamant-kite-43734
11/20/2023, 5:44 PMcrooked-vase-48008
12/20/2023, 2:49 PMjournalctl -ef -u rancher-system-agent | grep "error"
I am able to deploy 3 new Ubuntu 22.04 VMs manually and create a K3s cluster but I want to be able to do this via Rancher so wanting to get this working, What else can I provide to get to the bottom of this?plain-planet-80115
12/29/2023, 6:35 AMwaiting to apply plan
. There are no rancher-system-agent
and rke2-server
services running on the vm as well.
I'm using SLES 15 SP4 template to create the nodes. Following are the steps I performed to create a template for SLES 15 SP4:
1. Downloaded the Full package ISO from official channels.
2. Uploaded it in my vcenter env.
3. Started a virtual machine using the ISO file.
4. Tried Installing these packages on the machine. I was not able to install cloud-image-utils
, cloud-guest-utils
, cloud-initramfs-growroot
and open-iscsi
. The rest were successfully installed.
5. Cloned the VM to a template and used that template to provision a vsphere cluster from Rancher UI.
Can somebody please let me know if there something that I missed doing? I did the same steps for Ubuntu 22.04 and all the packages got installed properly and the cluster too got provisioned successfully. I'm facing this issue only with SLES.cool-tailor-91264
01/12/2024, 2:44 AMechoing-butcher-60732
01/18/2024, 9:54 PMlate-crayon-77645
01/19/2024, 8:21 PMlate-crayon-77645
01/19/2024, 10:41 PMlate-crayon-77645
01/19/2024, 10:41 PMdazzling-kitchen-36068
01/22/2024, 1:54 AMquick-queen-85877
01/23/2024, 7:23 AMglamorous-lighter-5580
01/25/2024, 6:16 AMlate-crayon-77645
02/01/2024, 1:59 PMNon-ready bootstrap machine(s) xxx and join url to be available on bootstrap node
I looked at the rancher server agent service log on the node and I had the same issue mhsraft has here - Non-ready bootstrap machine(s) xxx and join url to be available on bootstrap node". It appears to be some issue with certs… I eventually resolved this by just creating new rancher instances (wiping the persistent storage). These were just for testing but if this was production that wouldn’t be a good solution… I’m branch new to Rancher so not sure how to go about troubleshooting this error…salmon-noon-33588
02/13/2024, 4:00 PMFailed to create govmomi client.err: ServerFaultCode: Cannot complete login due to an incorrect user name or password
adventurous-address-26812
02/15/2024, 7:24 PMbored-nest-98612
02/21/2024, 2:45 PMdry-rose-72561
03/07/2024, 12:22 AMvsphere.csi-controller
pod:
2024-03-05T15:03:51.933610102Z {"level":"info","time":"2024-03-05T15:03:51.933519537Z","caller":"vanilla/controller.go:1805","msg":"CreateVolume: called with args {Name:pvc-92a703ce-2341-43ae-a572-af4838234308 CapacityRange:required_bytes:5368709120 VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[storagepolicyname:K8s-clusters-storage-policy] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"5b7272e0-7229-4546-8dbc-3cb6e1318fa4"}
2024-03-05T15:03:51.934092725Z {"level":"info","time":"2024-03-05T15:03:51.933878029Z","caller":"vanilla/controller.go:1805","msg":"CreateVolume: called with args {Name:pvc-a065755c-db26-4a1b-9498-6654e1ab2551 CapacityRange:required_bytes:5368709120 VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"66a7bd8c-8047-496a-a88c-e56732f3d4e3"}
2024-03-05T15:03:51.937904672Z {"level":"info","time":"2024-03-05T15:03:51.937767865Z","caller":"vanilla/controller.go:1805","msg":"CreateVolume: called with args {Name:pvc-0680d14c-eca2-44e5-b43d-7c1818d6c41b CapacityRange:required_bytes:10737418240 VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"fd77419b-5da9-472c-bad9-112a2780e501"}
2024-03-05T15:03:51.943866111Z panic: runtime error: invalid memory address or nil pointer dereference
2024-03-05T15:03:51.943898566Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1b22a25]
2024-03-05T15:03:51.943901505Z
2024-03-05T15:03:51.943904260Z goroutine 270 [running]:
2024-03-05T15:03:51.943907040Z <http://sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).createFileVolume(0xc000034000|sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).createFileVolume(0xc000034000>, {0x26af658, 0xc00093a8d0}, 0xc000486770)
2024-03-05T15:03:51.943927864Z /build/pkg/csi/service/vanilla/controller.go:1736 +0xd05
2024-03-05T15:03:51.943943618Z <http://sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume.func1()|sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume.func1()>
2024-03-05T15:03:51.943968699Z /build/pkg/csi/service/vanilla/controller.go:1848 +0x3d7
2024-03-05T15:03:51.943974291Z <http://sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume(0xc000034000|sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume(0xc000034000>, {0x26af658, 0xc00093a750}, 0xc000486770)
2024-03-05T15:03:51.943977838Z /build/pkg/csi/service/vanilla/controller.go:1858 +0x1bb
2024-03-05T15:03:51.944082915Z <http://github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler({0x229bba0|github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler({0x229bba0>?, 0xc000034000}, {0x26af658, 0xc00093a750}, 0xc00092e420, 0x0)
2024-03-05T15:03:51.944089917Z /go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:5671 +0x170
2024-03-05T15:03:51.944346148Z <http://google.golang.org/grpc.(*Server).processUnaryRPC(0xc000366e00|google.golang.org/grpc.(*Server).processUnaryRPC(0xc000366e00>, {0x26b6218, 0xc000109040}, 0xc00054c120, 0xc0008bb050, 0x38db8a0, 0x0)
2024-03-05T15:03:51.944354703Z /go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283 +0xcfe
2024-03-05T15:03:51.944357688Z <http://google.golang.org/grpc.(*Server).handleStream(0xc000366e00|google.golang.org/grpc.(*Server).handleStream(0xc000366e00>, {0x26b6218, 0xc000109040}, 0xc00054c120, 0x0)
2024-03-05T15:03:51.944361009Z /go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620 +0xa2f
2024-03-05T15:03:51.944364862Z <http://google.golang.org/grpc.(*Server).serveStreams.func1.2()|google.golang.org/grpc.(*Server).serveStreams.func1.2()>
Regards,
Visheshbland-painting-61617
03/09/2024, 6:49 PMwaitingfornoderef
, sit there until the unhealthynodetimeout
and get recreated.
Last time a node was created was around a year ago - all nodes have been self updating in the meantime, including bumping version of RKE2.
Recently, I had to add a new node and noticed the above issue. This is now causing some havoc as I needed to reprovision one of the nodes from the 4 node worker pool and that was never successful.
During that time, no configuration of the cluster was changed other than changing the version of RKE2 in the cluster yaml, vCenter 7 got patches installed and Rancher itself updated to latest versions with it currently being 2.8.2.
I'm wondering what could be wrong, the providerID is clearly not being populated and I'm observing logs from the Rancher POD as well as the CAPI controller and do not see any errors- just a lot of:
I0309 18:50:33.563422 1 machine_controller_noderef.go:54] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/storclus-ashford-c8f88ffdbxng69l-vn96s" namespace="fleet-default" name="storclus-ashford-c8f88ffdbxng69l-vn96s" reconcileID=b4bd169d-c928-4ed6-ac11-fca499dc5aee MachineSet="fleet-default/storclus-ashford-c8f88ffdbxng69l" MachineDeployment="fleet-default/storclus-ashford" Cluster="fleet-default/storclus" VmwarevsphereMachine="fleet-default/storclus-ashford-d00461a9-hjxqt"
I0309 18:50:33.563206 1 machine_controller_phases.go:286] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/storclus-ashford-c8f88ffdbxng69l-vn96s" namespace="fleet-default" name="storclus-ashford-c8f88ffdbxng69l-vn96s" reconcileID=b4bd169d-c928-4ed6-ac11-fca499dc5aee MachineSet="fleet-default/storclus-ashford-c8f88ffdbxng69l" MachineDeployment="fleet-default/storclus-ashford" Cluster="fleet-default/storclus" VmwarevsphereMachine="fleet-default/storclus-ashford-d00461a9-hjxqt"
I presume the infrastructure provider is embedded into the Rancher pod since CAPI controller manager seems to run in its own pod.
I also see the below, but I don't think it's related. However, I've checked logs from when last node successfully deployed (almost a year ago) and that warning wasn't there
W0309 18:49:31.253790 33 warnings.go:80] unknown field "spec.cloud-config"
I've also tried moving the base image from Ubuntu 20.04 to 22.04, increasing disk space on the new node and reviewing logs on the node however no errors are found in rancher-system-agent
and rke2-server
has no logs at the time.polite-piano-74233
03/11/2024, 5:07 AMadventurous-school-78304
03/15/2024, 6:57 PMshy-artist-94999
03/18/2024, 5:55 PM*Initial connection to kubernetes cluster failed with error get \"<https://prod-k8s-zone1.io/version>\": x509 certificate signed by unknown authority removing ca data.*
*memcache.go206 couldn't get resource list for <http://management.cattle.io/v3:*|management.cattle.io/v3:*>
Any advise would be helpful