https://rancher.com/ logo
Title
m

modern-dress-80156

09/08/2022, 8:40 PM
HI, We are trying to backup and restore our rancher (RKE2) + longhorn cluster using Velero, during restore, except rancher-server everything is coming up fine, rancher-server is always crashing with this error “”invalid memory address or nil pointer dereference” and we tried various combinations of restore but so far nothing worked, we wrote a script to un-install rancher and re-install but rancher gets bundled with tons of CRD’s and the cleanup script itself take 20 minutes to cleanup all the CRD’s. We want to avoid the necessity of plugging the cleanup script. So, Any idea how can we get rid of this error or what’s causing this ?
2022/06/20 16:15:13 [INFO] Starting <http://cluster.x-k8s.io/v1alpha3|cluster.x-k8s.io/v1alpha3>, Kind=Cluster controller
2022/06/20 16:15:13 [INFO] Watching metadata for <http://cluster.x-k8s.io/v1alpha3|cluster.x-k8s.io/v1alpha3>, Kind=MachineSet
2022/06/20 16:15:13 [INFO] Watching metadata for <http://cluster.x-k8s.io/v1alpha3|cluster.x-k8s.io/v1alpha3>, Kind=MachineHealthCheck
2022/06/20 16:15:13 [INFO] Starting <http://cluster.x-k8s.io/v1alpha3|cluster.x-k8s.io/v1alpha3>, Kind=MachineHealthCheck controller
2022/06/20 16:15:13 [INFO] Starting <http://cluster.x-k8s.io/v1alpha3|cluster.x-k8s.io/v1alpha3>, Kind=MachineSet controller
2022/06/20 16:15:14 [ERROR] error syncing 'git-webhook': handler apiservice: failed to update cattle-system/git-webhook-api-service /v1, Kind=ServiceAccount for apiservice git-webhook: ServiceAccount "git-webhook-api-service" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty, metadata.ownerReferences.name: Invalid value: "": name must not be empty], requeuing
2022/06/20 16:15:14 [INFO] namespaceHandler: addProjectIDLabelToNamespace: adding label <http://field.cattle.io/projectId=p-bcdbv|field.cattle.io/projectId=p-bcdbv> to namespace=kube-system
2022/06/20 16:15:15 [INFO] Updating global catalog library
2022/06/20 16:15:15 [ERROR] error syncing 'git-webhook': handler apiservice: failed to update cattle-system/git-webhook-api-service /v1, Kind=ServiceAccount for apiservice git-webhook: ServiceAccount "git-webhook-api-service" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty, metadata.ownerReferences.name: Invalid value: "": name must not be empty], requeuing
2022/06/20 16:15:17 [INFO] kontainerdriver azurekubernetesservice listening on address 127.0.0.1:35675
2022/06/20 16:15:17 [INFO] kontainerdriver googlekubernetesengine listening on address 127.0.0.1:41777
2022/06/20 16:15:17 [INFO] kontainerdriver amazonelasticcontainerservice listening on address 127.0.0.1:45839
E0620 16:15:17.118103      34 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 3105 [running]:
<http://k8s.io/apimachinery/pkg/util/runtime.logPanic(0x39b9940|k8s.io/apimachinery/pkg/util/runtime.logPanic(0x39b9940>, 0x6920620)
	/go/pkg/mod/k8s.io/apimachinery@v0.21.0/pkg/util/runtime/runtime.go:74 +0x95
<http://k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0|k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0>, 0x0, 0x0)
	/go/pkg/mod/k8s.io/apimachinery@v0.21.0/pkg/util/runtime/runtime.go:48 +0x86
panic(0x39b9940, 0x6920620)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9
<http://google.golang.org/grpc.(*Server).Stop(0x0)|google.golang.org/grpc.(*Server).Stop(0x0)>
	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1482 +0x4a
<http://github.com/rancher/rancher/pkg/kontainer-engine/types.(*GrpcServer).Stop(...)|github.com/rancher/rancher/pkg/kontainer-engine/types.(*GrpcServer).Stop(...)>
	/go/src/github.com/rancher/rancher/pkg/kontainer-engine/types/rpc_server.go:157
<http://github.com/rancher/rancher/pkg/kontainer-engine/service.(*RunningDriver).Stop(0xc00bafcce0)|github.com/rancher/rancher/pkg/kontainer-engine/service.(*RunningDriver).Stop(0xc00bafcce0)>
	/go/src/github.com/rancher/rancher/pkg/kontainer-engine/service/service.go:491 +0x45
<http://github.com/rancher/rancher/pkg/kontainer-engine/service.(*EngineService).GetDriverCreateOptions(0xc00bafcf90|github.com/rancher/rancher/pkg/kontainer-engine/service.(*EngineService).GetDriverCreateOptions(0xc00bafcf90>, 0x48ec250, 0xc000060060, 0xc00e1badf8, 0x16, 0xc012c5a000, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/rancher/rancher/pkg/kontainer-engine/service/service.go:331 +0x277
<http://github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).getResourceFields(0xc00cf15e80|github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).getResourceFields(0xc00cf15e80>, 0xc012c5a000, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver/kontainerdriver.go:182 +0x245
<http://github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).createDynamicSchema(0xc00cf15e80|github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).createDynamicSchema(0xc00cf15e80>, 0xc012c5a000, 0x418235f, 0x8)
	/go/src/github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver/kontainerdriver.go:115 +0x3f
<http://github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).createOrUpdateDynamicSchema(0xc00cf15e80|github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).createOrUpdateDynamicSchema(0xc00cf15e80>, 0xc012c5a000, 0x6, 0x417fb4d)
	/go/src/github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver/kontainerdriver.go:149 +0x1a5
<http://github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).Updated.func2(0x40d9ee0|github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver.(*Lifecycle).Updated.func2(0x40d9ee0>, 0xc012c5a000, 0x417cb5d, 0x6)
	/go/src/github.com/rancher/rancher/pkg/controllers/management/drivers/kontainerdriver/kontainerdriver.go:404 +0x3b
<http://github.com/rancher/norman/condition.Cond.doInternal(0x417cb5d|github.com/rancher/norman/condition.Cond.doInternal(0x417cb5d>, 0x6, 0x489fe01, 0x489fee0, 0xc012c5a000, 0xc00b37b6c0, 0x197, 0x417d1db, 0x6, 0x4187626)
	/go/pkg/mod/github.com/rancher/norman@v0.0.0-20210608202517-59b3523c3133/condition/condition.go:212 +0x9d
<http://github.com/rancher/norman/condition.Cond.do2(0x417cb5d|github.com/rancher/norman/condition.Cond.do2(0x417cb5d>, 0x6, 0x417cb01, 0x489fee0, 0xc012c5a000, 0xc00b37b6c0, 0x378e9c0, 0xc001c08010, 0x198, 0x417d1db, ...)
	/go/pkg/mod/github.com/rancher/norman@v0.0.0-20210608202517-59b3523c3133/condition/condition.go:181 +0x1dd
<http://github.com/rancher/norman/condition.Cond.do(...)|github.com/rancher/norman/condition.Cond.do(...)>
bumping up this thread once again, any pointers on how to debug this ? we logged a ticket via rancher premium support as well but support closed the ticket saying velero based restore is not supported, but we want to understand how can we get to the root of it such that we can figure out some around because we want an vendor agnostic backup and restore solution
b

bored-farmer-36655

09/10/2022, 4:35 PM
Hi, maybe raise an issue with velero. I see this old one: https://github.com/vmware-tanzu/velero/issues/3081 google velero+runtime.go+Observed a panic+"invalid memory address or nil pointer" produces additional hits...
c

creamy-pencil-82913

09/10/2022, 7:41 PM
It looks to me like Velero isn’t restoring some of the resource metadata, so when Rancher goes looking for related objects to track their relationships it crashes because some of the required fields are missing. We have our own backup-restore operator that’s intended to be used for backing up Rancher Manager state, and we also support etcd level backup/restore. Unless the Velero folks want to dig into it to figure out what they’re missing, I don’t think you’re going to get too much help from our side.
f

full-painter-23916

09/11/2022, 4:32 AM
As far as I remember (former employee...) pieces of Rancher refer to/stores things by UUID, for.. some possibly good reason. And Valero doesn't/can't restore them, because they're automatically generated by the k8s API and can't be set without going down to etcd directly. So new objects get new uuids and the stored references to old ones no longer find the expected resource. Anyway, yes Rancher has its own backup and restore mechanism for a reason, use it.
👍 1
m

modern-dress-80156

09/12/2022, 4:39 AM
btw thank you brandon, vincent and malcolm lewis for taking time and replying on this thread
@creamy-pencil-82913 can you please let us know the name of the resources that I need look for where there is a doubt that some fields aren’t getting restored by velero currently
c

creamy-pencil-82913

09/12/2022, 6:03 PM
it says in the error:
2022/06/20 16:15:14 [ERROR] error syncing 'git-webhook': handler apiservice: failed to update cattle-system/git-webhook-api-service /v1, Kind=ServiceAccount for apiservice git-webhook: ServiceAccount "git-webhook-api-service" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty, metadata.ownerReferences.name: Invalid value: "": name must not be empty], requeuing