https://rancher.com/ logo
Title
m

magnificent-yak-50132

05/08/2023, 1:14 PM
Hello. I am looking for some help with the pci Devices. Yesterday I installed three nodes and enabled the pci device controller. On two nodes it worked fine but on one the pod constantly restarts with an error I dont really understand. If I run lspci in the Pod it works perfectly fine. The log of the Pod:
time="2023-05-08T08:44:02Z" level=info msg="Applying CRD <http://pcidevices.devices.harvesterhci.io|pcidevices.devices.harvesterhci.io>"
time="2023-05-08T08:44:02Z" level=info msg="Applying CRD <http://pcideviceclaims.devices.harvesterhci.io|pcideviceclaims.devices.harvesterhci.io>"
time="2023-05-08T08:44:11Z" level=info msg="Registering PCI Device Claims controller"
time="2023-05-08T08:44:12Z" level=info msg="Loading driver vfio-pci"
time="2023-05-08T08:44:12Z" level=info msg="Loading driver vfio_iommu_type1"
time="2023-05-08T08:44:12Z" level=info msg="Registering PCI Devices controller"
time="2023-05-08T08:44:12Z" level=info msg="add mutation handler for [pods]. (Pod)"
time="2023-05-08T08:44:12Z" level=info msg="add mutation handler for [virtualmachines].<http://kubevirt.io|kubevirt.io> (VirtualMachine)"
time="2023-05-08T08:44:12Z" level=info msg="Active TLS secret harvester-system/pcidevices-webhook-tls (ver=25321) (count 1): map[<http://listener.cattle.io/cn-pcidevices-webhook.harvester-system.svc:pcidevices-webhook.harvester-system.svc|listener.cattle.io/cn-pcidevices-webhook.harvester-system.svc:pcidevices-webhook.harvester-system.svc> <http://listener.cattle.io/fingerprint:SHA1=06E69757A946BC22422ACA20E153E1DBBCBCF903]|listener.cattle.io/fingerprint:SHA1=06E69757A946BC22422ACA20E153E1DBBCBCF903]>"
time="2023-05-08T08:44:12Z" level=info msg="Listening on :8443"
I0508 08:44:13.628762       1 trace.go:205] Trace[911902081]: "DeltaFIFO Pop Process" ID:harv3-0000ff0c0,Depth:30,Reason:slow event handlers blocking the queue (08-May-2023 08:44:13.430) (total time: 197ms):
Trace[911902081]: [197.83893ms] [197.83893ms] END
time="2023-05-08T08:44:20Z" level=info msg="Starting /v1, Kind=Node controller"
time="2023-05-08T08:44:20Z" level=info msg="Starting <http://network.harvesterhci.io/v1beta1|network.harvesterhci.io/v1beta1>, Kind=VlanConfig controller"
time="2023-05-08T08:44:20Z" level=info msg="Starting <http://devices.harvesterhci.io/v1beta1|devices.harvesterhci.io/v1beta1>, Kind=PCIDevice controller"
time="2023-05-08T08:44:20Z" level=info msg="Starting <http://devices.harvesterhci.io/v1beta1|devices.harvesterhci.io/v1beta1>, Kind=PCIDeviceClaim controller"
time="2023-05-08T08:44:33Z" level=info msg="Reconciling PCI Devices list"
I0508 08:44:42.430276       1 trace.go:205] Trace[646203300]: "Reflector ListAndWatch" name:<http://github.com/harvester/pcidevices/go/pkg/mod/k8s.io/client-go@v0.23.7/tools/cache/reflector.go:167|github.com/harvester/pcidevices/go/pkg/mod/k8s.io/client-go@v0.23.7/tools/cache/reflector.go:167> (08-May-2023 08:44:17.628) (total time: 24801ms):
Trace[646203300]: ---"Objects listed" error:<nil> 24801ms (08:44:42.429)
Trace[646203300]: [24.801374748s] [24.801374748s] END
I0508 08:44:42.628726       1 trace.go:205] Trace[1747278511]: "DeltaFIFO Pop Process" ID:<http://recurringjobs.longhorn.io|recurringjobs.longhorn.io>,Depth:43,Reason:slow event handlers blocking the queue (08-May-2023 08:44:42.528) (total time: 100ms):
Trace[1747278511]: [100.017068ms] [100.017068ms] END
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1344034]

goroutine 31 [running]:
<http://github.com/harvester/pcidevices/pkg/util/nichelper.IdentifyHarvesterManagedNIC({0xc00005a00a|github.com/harvester/pcidevices/pkg/util/nichelper.IdentifyHarvesterManagedNIC({0xc00005a00a>, 0x5}, {0x1f0c1f0, 0xc001cb64b0}, {0x1f0bc40, 0xc00279bb30})
	/go/src/github.com/harvester/pcidevices/pkg/util/nichelper/helper.go:52 +0x454
<http://github.com/harvester/pcidevices/pkg/controller/pcidevice.Register({0x1f0b968|github.com/harvester/pcidevices/pkg/controller/pcidevice.Register({0x1f0b968>, 0xc002719fc0}, {0x7f1403b36f28, 0xc0004222a0}, 0xc00050c0d8, 0xc00050c0e0)
	/go/src/github.com/harvester/pcidevices/pkg/controller/pcidevice/pcidevice_controller.go:69 +0x345
main.run.func2()
	/go/src/github.com/harvester/pcidevices/main.go:148 +0x72
<http://golang.org/x/sync/errgroup.(*Group).Go.func1()|golang.org/x/sync/errgroup.(*Group).Go.func1()>
	/go/src/github.com/harvester/pcidevices/go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:75 +0x64
created by <http://golang.org/x/sync/errgroup.(*Group).Go|golang.org/x/sync/errgroup.(*Group).Go>
	/go/src/github.com/harvester/pcidevices/go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:72 +0xa5
time="2023-05-08T08:45:18Z" level=info msg="Applying CRD <http://pcidevices.devices.harvesterhci.io|pcidevices.devices.harvesterhci.io>"
time="2023-05-08T08:45:19Z" level=info msg="Applying CRD <http://pcideviceclaims.devices.harvesterhci.io|pcideviceclaims.devices.harvesterhci.io>"
time="2023-05-08T08:45:29Z" level=info msg="Registering PCI Device Claims controller"
time="2023-05-08T08:45:29Z" level=info msg="Loading driver vfio-pci"
time="2023-05-08T08:45:30Z" level=info msg="Loading driver vfio_iommu_type1"
time="2023-05-08T08:45:30Z" level=info msg="Registering PCI Devices controller"
time="2023-05-08T08:45:30Z" level=info msg="add mutation handler for [pods]. (Pod)"
time="2023-05-08T08:45:30Z" level=info msg="add mutation handler for [virtualmachines].<http://kubevirt.io|kubevirt.io> (VirtualMachine)"
time="2023-05-08T08:45:30Z" level=info msg="Active TLS secret harvester-system/pcidevices-webhook-tls (ver=25321) (count 1): map[<http://listener.cattle.io/cn-pcidevices-webhook.harvester-system.svc:pcidevices-webhook.harvester-system.svc|listener.cattle.io/cn-pcidevices-webhook.harvester-system.svc:pcidevices-webhook.harvester-system.svc> <http://listener.cattle.io/fingerprint:SHA1=06E69757A946BC22422ACA20E153E1DBBCBCF903]|listener.cattle.io/fingerprint:SHA1=06E69757A946BC22422ACA20E153E1DBBCBCF903]>"
time="2023-05-08T08:45:30Z" level=info msg="Listening on :8443"
time="2023-05-08T08:45:36Z" level=info msg="Starting <http://network.harvesterhci.io/v1beta1|network.harvesterhci.io/v1beta1>, Kind=VlanConfig controller"
time="2023-05-08T08:45:36Z" level=info msg="Starting <http://devices.harvesterhci.io/v1beta1|devices.harvesterhci.io/v1beta1>, Kind=PCIDevice controller"
time="2023-05-08T08:45:36Z" level=info msg="Starting /v1, Kind=Node controller"
time="2023-05-08T08:45:36Z" level=info msg="Starting <http://devices.harvesterhci.io/v1beta1|devices.harvesterhci.io/v1beta1>, Kind=PCIDeviceClaim controller"
I0508 08:45:36.429320       1 trace.go:205] Trace[208240456]: "DeltaFIFO Pop Process" ID:harv3-000000043,Depth:108,Reason:slow event handlers blocking the queue (08-May-2023 08:45:36.229) (total time: 200ms):
Trace[208240456]: [200.154497ms] [200.154497ms] END
time="2023-05-08T08:45:51Z" level=info msg="Reconciling PCI Devices list"
I0508 08:45:56.328655       1 trace.go:205] Trace[646203300]: "Reflector ListAndWatch" name:<http://github.com/harvester/pcidevices/go/pkg/mod/k8s.io/client-go@v0.23.7/tools/cache/reflector.go:167|github.com/harvester/pcidevices/go/pkg/mod/k8s.io/client-go@v0.23.7/tools/cache/reflector.go:167> (08-May-2023 08:45:35.629) (total time: 20601ms):
Trace[646203300]: ---"Objects listed" error:<nil> 20601ms (08:45:56.230)
Trace[646203300]: [20.601417428s] [20.601417428s] END
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1344034]

goroutine 121 [running]:
<http://github.com/harvester/pcidevices/pkg/util/nichelper.IdentifyHarvesterManagedNIC({0xc00005a00a|github.com/harvester/pcidevices/pkg/util/nichelper.IdentifyHarvesterManagedNIC({0xc00005a00a>, 0x5}, {0x1f0c1f0, 0xc0028381b0}, {0x1f0bc40, 0xc0032552c0})
	/go/src/github.com/harvester/pcidevices/pkg/util/nichelper/helper.go:52 +0x454
<http://github.com/harvester/pcidevices/pkg/controller/pcidevice.Register({0x1f0b968|github.com/harvester/pcidevices/pkg/controller/pcidevice.Register({0x1f0b968>, 0xc002f81d40}, {0x7fdb05fbb640, 0xc00029fe30}, 0xc000704650, 0xc000704658)
	/go/src/github.com/harvester/pcidevices/pkg/controller/pcidevice/pcidevice_controller.go:69 +0x345
main.run.func2()
	/go/src/github.com/harvester/pcidevices/main.go:148 +0x72
<http://golang.org/x/sync/errgroup.(*Group).Go.func1()|golang.org/x/sync/errgroup.(*Group).Go.func1()>
	/go/src/github.com/harvester/pcidevices/go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:75 +0x64
created by <http://golang.org/x/sync/errgroup.(*Group).Go|golang.org/x/sync/errgroup.(*Group).Go>
	/go/src/github.com/harvester/pcidevices/go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:72 +0xa5
downgraded to 0.2.3 and it found all devices.
a

ancient-pizza-13099

05/08/2023, 7:47 PM
cc @great-bear-19718
g

great-bear-19718

05/08/2023, 11:01 PM
what is 3rd node type?
i suspect the nic's in this node are not pcidevices