This message was deleted.
# harvester
a
This message was deleted.
l
Here is the logs /var/lib/rancher/rke2/bin/kubectl apply --validate=false -f /var/lib/rancher/rancherd/bootstrapmanifests/rancherd.yaml deployment.apps/harvester-cluster-repo unchanged service/harvester-cluster-repo unchanged clusterrepo.catalog.cattle.io/harvester-charts unchanged namespace/cattle-monitoring-system unchanged namespace/cattle-logging-system unchanged serviceaccount/rancher-monitoring-operator unchanged helmchartconfig.helm.cattle.io/rke2-ingress-nginx unchanged secret/tls-ingress unchanged ingress.networking.k8s.io/rancher-expose unchanged managedchart.management.cattle.io/harvester unchanged managedchart.management.cattle.io/harvester-crd unchanged managedchart.management.cattle.io/rancher-monitoring-crd unchanged namespace/cattle-dashboards unchanged configmap/harvester-vm-dashboard unchanged configmap/harvester-vm-detail-dashboard unchanged managedchart.management.cattle.io/rancher-monitoring unchanged managedchart.management.cattle.io/rancher-logging-crd unchanged managedchart.management.cattle.io/rancher-logging unchanged node/bdc-harvester-01 unchanged namespace/fleet-local unchanged cluster.provisioning.cattle.io/local unchanged secret/local-rke-state unchanged clusterregistrationtoken.management.cattle.io/default-token unchanged clusterrepo.catalog.cattle.io/rancher-stable unchanged resource mapping not found for name: "harvester-default" namespace: "cattle-logging-system" from "/var/lib/rancher/rancherd/bootstrapmanifests/rancherd.yaml": no matches for kind "EventTailer" in version "logging-extensions.banzaicloud.io/v1alpha1" ensure CRDs are installed first resource mapping not found for name: "auto-disk-provision-paths" namespace: "" from "/var/lib/rancher/rancherd/bootstrapmanifests/rancherd.yaml": no matches for kind "Setting" in version "harvesterhci.io/v1beta1" ensure CRDs are installed first resource mapping not found for name: "vm-import-controller" namespace: "harvester-system" from "/var/lib/rancher/rancherd/bootstrapmanifests/rancherd.yaml": no matches for kind "Addon" in version "harvesterhci.io/v1beta1" ensure CRDs are installed first resource mapping not found for name: "pcidevices-controller" namespace: "harvester-system" from "/var/lib/rancher/rancherd/bootstrapmanifests/rancherd.yaml": no matches for kind "Addon" in version "harvesterhci.io/v1beta1" ensure CRDs are installed first
Also i am seeing harvester-cluster-repo is not available to pull
Copy code
Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       14m                   default-scheduler  Successfully assigned cattle-system/harvester-cluster-repo-67ddddf8d7-r522f to x.x.x.x
  Normal   AddedInterface  14m                   multus             Add eth0 [10.52.0.19/32] from k8s-pod-network
  Normal   Pulling         13m (x3 over 14m)     kubelet            Pulling image "rancher/harvester-cluster-repo:v1.1.2"
  Warning  Failed          13m (x3 over 14m)     kubelet            Failed to pull image "rancher/harvester-cluster-repo:v1.1.2": rpc error: code = NotFound desc = failed to pull and unpack image "<http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>": failed to resolve reference "<http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>": <http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>: not found
  Warning  Failed          13m (x3 over 14m)     kubelet            Error: ErrImagePull
  Normal   BackOff         13m (x3 over 14m)     kubelet            Back-off pulling image "rancher/harvester-cluster-repo:v1.1.2"
  Warning  Failed          13m (x3 over 14m)     kubelet            Error: ImagePullBackOff
  Normal   Pulling         11m (x4 over 13m)     kubelet            Pulling image "rancher/harvester-cluster-repo:v1.1.2"
  Warning  Failed          11m (x4 over 13m)     kubelet            Failed to pull image "rancher/harvester-cluster-repo:v1.1.2": rpc error: code = NotFound desc = failed to pull and unpack image "<http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>": failed to resolve reference "<http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>": <http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>: not found
  Warning  Failed          11m (x4 over 13m)     kubelet            Error: ErrImagePull
  Warning  Failed          10m (x6 over 12m)     kubelet            Error: ImagePullBackOff
  Normal   BackOff         2m54s (x39 over 12m)  kubelet            Back-off pulling image "rancher/harvester-cluster-repo:v1.1.2"
w
Any particular reason why version 1.1.2? 1.2.0 is now the latest version.
l
I have rancher 2.7.4 which is running and there are other cluster running in rancher, rancher 2.7.4 supports harvester 1.1.1 or 1.1.2
a
I leave some steps to follow in the github issue.
crictl image ls / docker image ls
to list all container images in the node
kubectl get pods -A
list all running /pending PODs
journalctl --unit=rancherd
to list rancherd log
l
@ancient-pizza-13099 Give me some time i have to redeploy everything, I am seeing problems in 1.2.0 as well
a
what do you mean
redploy
? Harveter needs to installed via
ISO or PXE
, the
helm
mode is not supported in
v1.1.2
l
pxe
a
OK, I will wait, the
harvester-repo
image is bundled in the ISO
l
wait then why was it trying to pull from docker?
a
the v1.2.0 related docker image is also available in docker hub; but v1.1.2 is not available https://registry.hub.docker.com/r/rancher/harvester-cluster-repo/tags
l
@ancient-pizza-13099 if you see the logs in same thread above when i tried to describe the pod it was trying to pull from docker for 1.1.2
Copy code
rancher/harvester-cluster-repo:v1.1.2": rpc error: code = NotFound desc = failed to pull and unpack image "<http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>": failed to resolve reference "<http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>": <http://docker.io/rancher/harvester-cluster-repo:v1.1.2|docker.io/rancher/harvester-cluster-repo:v1.1.2>: not found
a
the image is bundled into ISO, and installed into NODE
only when it is not in the NODE, it will pull from remote
l
So that means it was not there in 1.1.2 iso image
a
no, it is in the ISO, we have already so many v1.1.2 installed, if the repo image is not in ISO, all of the installation will fail 🙂
l
Wait i have other clusters running 1.1.2
a
the issue here is, the image is not correctly/successfully added to your NODE in the installation
try
docker image ls | grep repo
in your node
l
ok
a
l
can you tell me the logs you need? and how to get those? Also if i provide you the support logs i generate using the bundle will it suffice?
a
a workaroud: we have a solution to local build the repo image and add to the node https://github.com/harvester/harvester/issues/4537#issuecomment-1730230548
as your cluster is pending setting, not possbile to generate the support-bundle now
we need to trouble-shooting step by step
l
ok when will you be available next?
a
now, the `repo`image is missing, then we can make one and add it to the cluster
l
i will take an hour
Hello @ancient-pizza-13099
I have added the details you have asked in github
a
@loud-nest-3266: per your posted text, is it from v112 cluster or v120 ?
l
V112
a
did you tried to pull this image manually ?
Copy code
rancher/harvester-cluster-repo v1.2.0 f611f167f97b 22 minutes ago linux/amd64 0.0 B 178.9 MiB
l
Yes and in docker hub also I checked it wasn’t there
a
try
Copy code
docker pull rancher/harvester-cluster-repo:v1.2.0
and post an image of the running, like
and also:
Copy code
harv41:~ # crictl image ls | grep repo
<http://docker.io/rancher/harvester-cluster-repo|docker.io/rancher/harvester-cluster-repo>                                    master                                     1a46af9f7aa39       188MB
<http://docker.io/rancher/harvester-cluster-repo|docker.io/rancher/harvester-cluster-repo>                                    master-head                                81322bcd9d722       65.1MB
<http://docker.io/rancher/harvester-cluster-repo|docker.io/rancher/harvester-cluster-repo>                                    v1.2.0                                     bfa840fe131b6       65.1MB
harv41:~ #
now, due to the v112
cluster-repo
image is not existing, the further deployment are blocked
l
dc-harvester-01:~ # docker pull rancher/harvester-cluster-repo:v1.2.0 docker.io/rancher/harvester-cluster-repo:v1.2.0: resolved |++++++++++++++++++++++++++++++++++++++| index-sha25628b6e0a76551b629ba6315a4619b1389d470120e3a508b54d509312219a87714 done |++++++++++++++++++++++++++++++++++++++| manifest-sha2563cd1d15f55c5050e3a15d1d329166066f0d97225e87205cadab86502fb33e96f done |++++++++++++++++++++++++++++++++++++++| config-sha256bfa840fe131b6b7318f27e4cb37acbffec734bd410f442a2c89875519da98e0b done |++++++++++++++++++++++++++++++++++++++| layer-sha256d32297b4427e4ed638304598545eb44bd3c9563167787e37f6942f86a6bc17b8 done |++++++++++++++++++++++++++++++++++++++| layer-sha256ed6550064d4ef7ef6e8bc6081e3db04c4398ca7fad59c20f07fd4662eba9b27c done |++++++++++++++++++++++++++++++++++++++| layer-sha2569f1ebb7d8303b2c6f7fd31bc75ca5c45ec6067a6a18917cfd8111853464a88ba done |++++++++++++++++++++++++++++++++++++++| elapsed: 14.1s total: 62.1 M (4.4 MiB/s) bdc-harvester-01:~ # crictl image ls | grep repo docker.io/rancher/harvester-cluster-repo v1.2.0 bfa840fe131b6 65.1MB bdc-harvester-01:~ #harvester
a
when you have a share-able writing linke, I can upload the image; or you can colne the source code to make an image
l
1.2.0 it pulls
but we require 1.1.2
a
unfortunately, the v110 repo-image was not published to docker, and I also have no credential to publish it
another user make a local one to add to v120 cluster: https://github.com/harvester/harvester/issues/4537#issuecomment-1730230548
l
Ok so what options do we have now? i understand this docker image needs to be added
Or i can install v1.2.0 but i have issues in 1.2.0 as well and i have submitted a bug for that
a
Yes, when continuing with v112, then find a Linux machine/VM, to run related make to get the repo image.
l
@ancient-pizza-13099 we have multiple clusters we need permanent solutions? may i know why the image was deleted in the first place?
a
it was not deleted, I suspect in certain cases, the image was not successfully added to local
the possible cause, I am investigating
l
Ok, Please let me know if you need any help from my side i am happy to help it resolve asap
a
just keep this environment for some hours, maye we need further information
Copy code
harv41:~ # journalctl --unit=rancherd
Sep 11 09:15:50 harv41 systemd[1]: Starting Rancher Bootstrap...
do you have such first line
l
one sec
No i dont see that specific log
a
it seems to be overflowed
l
All i see is error from start
a
loop log of ...
l
sorry my mistake i was seeing at the end
here it is
journalctl -u rancherd Sep 29 110354 bdc-harvester-01 systemd[1]: Starting Rancher Bootstrap... Sep 29 110354 bdc-harvester-01 rancherd[3715]: time="2023-09-29T110354Z"
a
save it to a file and add it here journalctl --unit=rancherd > /tmp/a1.txt
l
done
you need that file?
a
yes
add it here
l
ok
a
please also list all the current images:
crictl image ls
l
harvester 1.1.2 is looking for cluster repo 1.1.2 but it is not available instead 1.2.0 is available
either add the cluster image into docker or change the image in iso so it points to 1.2.0 (If it doesn't break anything).
a
please also attache this file: /oem/install/console.log
no, the repo v120 can't be used in v112
the console.log is important
normally, it has such output, check the last few lines of the
grep
output
grep repo /oem/install/console.log
why it is saying
SubnetMask:255.255.255.0 Gateway:gateway-ip DefaultRoute:false BondOptions:map[miimon:100 mode:balance-tlb] MTU:0 VlanID:0} Vip:vip-ip-address VipHwAddr: VipMode:static ForceEFI:false Device:/dev/disk/by-path/pci-0000:ca:00.0-scsi-0:2:0:0 ConfigURL:<http://pxe-server/bootcfg/generic?mac=mac-address> Silent:false ISOURL:<http://pxe-server/bootcfg/assets/opensuse/harvester-1.2.0/harvester-v1.2.0-amd64.iso> PowerOff:false NoFormat:false
this looks to be an v1.2.0 cluster
l
cat /etc/os-release NAME="SLE Micro" VERSION="5.3" VERSION_ID="5.3" PRETTY_NAME="Harvester v1.1.2" ID="sle-micro-rancher" ID_LIKE="suse" ANSI_COLOR="0;32" CPE_NAME="cpe/osusesle micro rancher5.3" VARIANT="Harvester" VARIANT_ID="Harvester-20230420" GRUB_ENTRY_NAME="Harvester v1.1.2"
in os release it says 1.1.2
seems like the new iso build was a mess
a
but your PXE iso_url is v120
l
thats just for name sake i have changed the profiles in my server
a
Copy code
time="2023-09-29T10:58:13Z" level=info msg="[stdout]: Checking 42 images in /tmp/images-lists/harvester-images-v1.2.0.txt..."
time="2023-09-29T10:59:16Z" level=info msg="[stdout]: done"
time="2023-09-29T10:59:16Z" level=info msg="[stdout]: Checking 1 images in /tmp/images-lists/harvester-repo-images-v1.2.0.txt..."
time="2023-09-29T10:59:16Z" level=info msg="[stdout]: done"
time="2023-09-29T10:59:16Z" level=info msg="[stdout]: Checking 29 images in /tmp/images-lists/rancher-images-v2.7.5.txt..."
no, the installer did not agree it is v112
but this line of log also confused me
Copy code
time="2023-09-29T10:48:34Z" level=info msg="[stdout]: \x1b[36mINFO\x1b[0m[2023-09-29T10:48:34Z] Setting default grub entry to Harvester v1.1.2 "
l
do you want me to do fresh install ?
manually i can do that in 15 mins
a
kubernetesVersion: v1.24.11+rke2r1\n rancherVersion: v2.6.11\n
I guess the re-naming may cause some kind of bug
l
yes
a
both ISO/manuall installation are used by us day by day
would suggest re-install it with correct naming
l
Ok
a
the v112 and v120 have diffent ISO size, just double check with our release, to avoid mix them
harvester v112: ISO ~4.9GiB
Copy code
-rw-rw-r-- 1 libvirt-qemu kvm  4.9G May 18 23:48 harvester-v1.1.2-amd64.iso
harvester v120: ISO ~5.4GiB
Copy code
-rw-r--r-- 1 libvirt-qemu kvm  5.4G Sep 11 10:56 h-m-v120.iso
l
Ok let me re-check
I am trying to install manaully
a
the ISO installer will show the version at very begin, it is not depending the ISO name (should)
l
It is same problem again
with manual also
a
please pack those files:
Copy code
mkdir /tmp/debug
cp /oem/install/console.log /tmp/debug/console.log
journalctl --unit=rancherd > /tmp/debug/rancherd.log
kubectl get pods -A > /tmp/debug/pods.log
crictl image ls > /tmp/debug/images.log
zip all those files and upload the zip file here
cc @prehistoric-balloon-31801 @red-king-19196
r
Are those ISO images downloaded from the GitHub release page? We could do a quick checksum verification. Weirdly, the
console.txt
shows both v1.1.2 and v1.2.0 information. Could it be the PXE booting from the v1.1.2 kernel, initrd, and rootfs, but installing with the v1.2.0 ISO image?
The container images preloaded are all from Harvester v1.2.0
@loud-nest-3266 could you help double confirm that the artifacts (kernel, initrd, and rootfs) serving by the pxe server are in the correct version? thanks!
a
@loud-nest-3266 later did a manual installation via v112 ISO, but said issue was still there, we are waiting for those related log files cc @red-king-19196
👌 1
l
Hello Zespre I had downloaded from GitHub the image
@ancient-pizza-13099 i was trying something else sorry couldn’t respond
First I tried with manual installation using same image as pxe I saw same results
Then I used the image from my share which was downloaded before and I succeeded in installation
a
no worries,when you are free, please collect files and upload
l
Right now the cluster is 5 nodes cluster up and running
a
sounds like something went wrong with your previous ISO
l
Yes I came to same conclusion
a
that's wonderful
I checked the source code, and found no reason the repo image was lost during new installation
l
But the only thing I am not sure about is after downloading from GitHub I verified checksums and it was fine
a
if you have the related logs from the failed installation with that said ISO, we can investigate further
l
All the logs I had added in this thread
I don’t have any other logs right now
a
if you encounter it again, please contact us, thanks.
l
Sure thing