This message was deleted.
# k3s
a
This message was deleted.
c
did you also disable the local storage provisioner?
b
No I want to use that.
c
Or is your app asking for a specific class or type of PVC that local storage does not support (such as rwx)?
b
I don't know how to confirm that.
I was able to guess a pv though and it seems like it took it?
c
what do you mean by “guess a PV”
If your app uses a PVC the PV will be created for it
b
Copy code
$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
pvc-0c4f3669-dd28-4397-abeb-28d8c535a6dd   1Gi        RWO            Delete           Bound    dhub/hub-db-dir    local-path              81m
jupyter-admin                              10Gi       RWO            Delete           Bound    dhub/claim-admin   local-path              40m
and
Copy code
$ kubectl get pvc --all-namespaces
NAMESPACE   NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
dhub        hub-db-dir    Bound    pvc-0c4f3669-dd28-4397-abeb-28d8c535a6dd   1Gi        RWO            local-path     81m
dhub        claim-admin   Bound    jupyter-admin                              10Gi       RWO            local-path     80m
No pv got created for it.
c
did you manually create that PV? or did you let the PVC create it for you?
b
I manually created that pv.
c
you shouldn’t need to do that, that’s probably why it’s not working
b
This is happening after the helm is ran, so I think the dhub stuff is trying to create a new pod and not trying to create the pv at this point.
c
If the Pod has a volume from a PVC, the PVC should create a PV automatically.
What does the PVC from the chart look like?
b
That doesn't happen. The pvc just sits waiting for first consumer.
Copy code
$ kubectl describe pvc claim-admin -n dhub
Name:          claim-admin
Namespace:     dhub
StorageClass:  local-path
Status:        Bound
Volume:        jupyter-admin
Labels:        app=jupyterhub
               chart=jupyterhub-2.0.0
               component=singleuser-storage
               heritage=jupyterhub
               <http://hub.jupyter.org/username=admin|hub.jupyter.org/username=admin>
               release=dhub
Annotations:   <http://hub.jupyter.org/username|hub.jupyter.org/username>: admin
               <http://pv.kubernetes.io/bind-completed|pv.kubernetes.io/bind-completed>: yes
               <http://pv.kubernetes.io/bound-by-controller|pv.kubernetes.io/bound-by-controller>: yes
Finalizers:    [<http://kubernetes.io/pvc-protection|kubernetes.io/pvc-protection>]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                Age                  From                         Message
  ----    ------                ----                 ----                         -------
  Normal  WaitForPodScheduled   48m (x25 over 83m)   persistentvolume-controller  waiting for pod jupyter-admin to be scheduled
  Normal  WaitForFirstConsumer  43m (x123 over 83m)  persistentvolume-controller  waiting for first consumer to be created before binding
c
waiting for consumer indicates that the pod isn’t ready to run for some reason. it’s waiting for the pod to actually be scheduled to a node before it creates the PV
I think you’re got the cart ahead of the horse. Figure out why the pod isn’t being scheduled. once you fix that, the PV will be created for you.
b
I looked at the pod as well. No events, the only thing I can go on is these snips from `kubectl describe pod jupyter-admin -n dhub`:
Copy code
Service Account:  default
Node:             <none>
and
Copy code
Status:           Pending
IP:               
IPs:              <none>
It looks like it requests:
Copy code
Requests:
      memory:  1073741824
Is that 1G? Even if it's 10G the instance can support it.
The docs on pod troubleshooting pending, no events, lead nowhere.
c
what does the full describe output look like
b
Copy code
Name:             jupyter-admin
Namespace:        dhub
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=jupyterhub
                  chart=jupyterhub-2.0.0
                  component=singleuser-server
                  heritage=jupyterhub
                  <http://hub.jupyter.org/network-access-hub=true|hub.jupyter.org/network-access-hub=true>
                  <http://hub.jupyter.org/servername=|hub.jupyter.org/servername=>
                  <http://hub.jupyter.org/username=admin|hub.jupyter.org/username=admin>
                  release=dhub
Annotations:      <http://hub.jupyter.org/username|hub.jupyter.org/username>: admin
Status:           Pending
IP:               
IPs:              <none>
Init Containers:
  block-cloud-metadata:
    Image:      jupyterhub/k8s-network-tools:2.0.0
    Port:       <none>
    Host Port:  <none>
    Command:
      iptables
      -A
      OUTPUT
      -d
      169.254.169.254
      -j
      DROP
    Environment:  <none>
    Mounts:       <none>
Containers:
  notebook:
    Image:      pangeo/base-notebook:2023.01.13
    Port:       8888/TCP
    Host Port:  0/TCP
    Args:
      jupyterhub-singleuser
    Requests:
      memory:  1073741824
    Environment:
      DASK_GATEWAY__ADDRESS:                   <http://proxy-public/services/dask-gateway>
      DASK_GATEWAY__AUTH__TYPE:                jupyterhub
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE:   {JUPYTER_IMAGE_SPEC}
      DASK_GATEWAY__PROXY_ADDRESS:             <gateway://traefik-dhub-dask-gateway.dhub:80>
      DASK_GATEWAY__PUBLIC_ADDRESS:            /services/dask-gateway/
      JPY_API_TOKEN:                           ee37e7379c7542daaabcf50d350254e2
      JUPYTERHUB_ACTIVITY_URL:                 <http://hub:8081/hub/api/users/admin/activity>
      JUPYTERHUB_ADMIN_ACCESS:                 1
      JUPYTERHUB_API_TOKEN:                    ee37e7379c7542daaabcf50d350254e2
      JUPYTERHUB_API_URL:                      <http://hub:8081/hub/api>
      JUPYTERHUB_BASE_URL:                     /
      JUPYTERHUB_CLIENT_ID:                    jupyterhub-user-admin
      JUPYTERHUB_DEFAULT_URL:                  /lab
      JUPYTERHUB_HOST:                         
      JUPYTERHUB_OAUTH_ACCESS_SCOPES:          ["access:servers!server=admin/", "access:servers!user=admin"]
      JUPYTERHUB_OAUTH_CALLBACK_URL:           /user/admin/oauth_callback
      JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES:  []
      JUPYTERHUB_OAUTH_SCOPES:                 ["access:servers!server=admin/", "access:servers!user=admin"]
      JUPYTERHUB_SERVER_NAME:                  
      JUPYTERHUB_SERVICE_PREFIX:               /user/admin/
      JUPYTERHUB_SERVICE_URL:                  <http://0.0.0.0:8888/user/admin/>
      JUPYTERHUB_USER:                         admin
      JUPYTER_IMAGE:                           pangeo/base-notebook:2023.01.13
      JUPYTER_IMAGE_SPEC:                      pangeo/base-notebook:2023.01.13
      MEM_GUARANTEE:                           1073741824
    Mounts:
      /home/jovyan from volume-admin (rw)
Volumes:
  volume-admin:
    Type:        PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:   claim-admin
    ReadOnly:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     <http://hub.jupyter.org/dedicated=user:NoSchedule|hub.jupyter.org/dedicated=user:NoSchedule>
                 hub.jupyter.org_dedicated=user:NoSchedule
                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:          <none>
Is there a better describe?
c
kinda looks like the scheduler just hasn’t picked it up yet
do you have a ready node in the cluster?
b
It's a single node cluster.
c
is it ready though? and is the scheduler running?
it is odd that it would remain pending and unscheduled without any errors in the events
b
How do I verify scheduler?
Copy code
$ kubectl get nodes
NAME                     STATUS   ROLES                  AGE    VERSION
test                     Ready    control-plane,master   116m   v1.27.6+k3s1
Copy code
$ kubectl get pods -n kube-system
NAME                                     READY   STATUS    RESTARTS   AGE
local-path-provisioner-957fdf8bc-z87f2   1/1     Running   0          117m
coredns-77ccd57875-7d2l4                 1/1     Running   0          117m
metrics-server-648b5df564-952gk          1/1     Running   0          117m
svclb-proxy-public-e40c84ce-fgptz        1/1     Running   0          113m
c
can you
get -o yaml
that pod?
are there any messages in the k3s service log that might indicate what’s going on?
b
Copy code
apiVersion: v1
kind: Pod
metadata:
  annotations:
    <http://hub.jupyter.org/username|hub.jupyter.org/username>: admin
  creationTimestamp: "2023-10-26T20:43:01Z"
  labels:
    app: jupyterhub
    chart: jupyterhub-2.0.0
    component: singleuser-server
    heritage: jupyterhub
    <http://hub.jupyter.org/network-access-hub|hub.jupyter.org/network-access-hub>: "true"
    <http://hub.jupyter.org/servername|hub.jupyter.org/servername>: ""
    <http://hub.jupyter.org/username|hub.jupyter.org/username>: admin
    release: dhub
  name: jupyter-admin
  namespace: dhub
  resourceVersion: "3312"
  uid: ac270b74-5eab-4d54-851e-d304c2f06454
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: <http://hub.jupyter.org/node-purpose|hub.jupyter.org/node-purpose>
            operator: In
            values:
            - user
        weight: 100
  automountServiceAccountToken: false
  containers:
  - args:
    - jupyterhub-singleuser
    env:
    - name: DASK_GATEWAY__ADDRESS
      value: <http://proxy-public/services/dask-gateway>
    - name: DASK_GATEWAY__AUTH__TYPE
      value: jupyterhub
    - name: DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE
      value: '{JUPYTER_IMAGE_SPEC}'
    - name: DASK_GATEWAY__PROXY_ADDRESS
      value: <gateway://traefik-dhub-dask-gateway.dhub:80>
    - name: DASK_GATEWAY__PUBLIC_ADDRESS
      value: /services/dask-gateway/
    - name: JPY_API_TOKEN
      value: f247c69ba2c14f38a664ab2d4cd5a448
    - name: JUPYTERHUB_ACTIVITY_URL
      value: <http://hub:8081/hub/api/users/admin/activity>
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_API_TOKEN
      value: f247c69ba2c14f38a664ab2d4cd5a448
    - name: JUPYTERHUB_API_URL
      value: <http://hub:8081/hub/api>
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-admin
    - name: JUPYTERHUB_DEFAULT_URL
      value: /lab
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
      value: '["access:servers!server=admin/", "access:servers!user=admin"]'
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/admin/oauth_callback
    - name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
      value: '[]'
    - name: JUPYTERHUB_OAUTH_SCOPES
      value: '["access:servers!server=admin/", "access:servers!user=admin"]'
    - name: JUPYTERHUB_SERVER_NAME
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/admin/
    - name: JUPYTERHUB_SERVICE_URL
      value: <http://0.0.0.0:8888/user/admin/>
    - name: JUPYTERHUB_USER
      value: admin
    - name: JUPYTER_IMAGE
      value: pangeo/base-notebook:2023.01.13
    - name: JUPYTER_IMAGE_SPEC
      value: pangeo/base-notebook:2023.01.13
    - name: MEM_GUARANTEE
      value: "1073741824"
    image: pangeo/base-notebook:2023.01.13
    imagePullPolicy: IfNotPresent
    lifecycle: {}
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      requests:
        memory: "1073741824"
    securityContext:
      allowPrivilegeEscalation: false
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/jovyan
      name: volume-admin
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - iptables
    - -A
    - OUTPUT
    - -d
    - 169.254.169.254
    - -j
    - DROP
    image: jupyterhub/k8s-network-tools:2.0.0
    imagePullPolicy: IfNotPresent
    name: block-cloud-metadata
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
      privileged: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: OnFailure
  schedulerName: dhub-user-scheduler
  securityContext:
    fsGroup: 100
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: <http://hub.jupyter.org/dedicated|hub.jupyter.org/dedicated>
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoExecute
    key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: volume-admin
    persistentVolumeClaim:
      claimName: claim-admin
status:
  phase: Pending
  qosClass: Burstable
c
oh well yeah
Copy code
schedulerName: dhub-user-scheduler
that’s a custom scheduler
did you deploy a custom scheduler to your cluster?
b
No, that must've come from the helm charts.
There's two of it running:
c
if you put that on there, the default scheduler wont’ pick up the pod
b
Copy code
user-scheduler-5f67977fb6-mjsc7                 1/1     Running   0               118m
user-scheduler-5f67977fb6-f6k5b                 1/1     Running   0               118m
c
hmm. you didn’t show those in your pod list from earlier…
b
Those aren't in kube-system.
Copy code
kubectl get pods -n dhub
NAME                                            READY   STATUS    RESTARTS        AGE
continuous-image-puller-4jjfw                   1/1     Running   0               118m
user-scheduler-5f67977fb6-mjsc7                 1/1     Running   0               118m
user-scheduler-5f67977fb6-f6k5b                 1/1     Running   0               118m
controller-dhub-dask-gateway-6f56889d79-tv8rc   1/1     Running   0               118m
api-dhub-dask-gateway-bd6bf97f-c9mzl            1/1     Running   0               118m
traefik-dhub-dask-gateway-7ddf6cdb7c-hhmfm      1/1     Running   0               118m
proxy-6c5694c86b-fmw8r                          1/1     Running   0               102m
hub-6c65dcc7d6-cv8sf                            1/1     Running   1 (2m49s ago)   102m
jupyter-admin                                   0/1     Pending   0               2m38s
c
anyways, the pod is waiting to be scheduled by your custom scheduler. You’d need to look at the scheduler logs to see why it’s not doing it.
b
The dhub scheduler?
c
yes
probably one of those user-scheduler pods says something about it
I kind of suspect the
<http://hub.jupyter.org/node-purpose|hub.jupyter.org/node-purpose>
nodeAffinity has something to do with it, is the custom scheduler expecting you to have put labels or annotations on your nods that they do not have?
b
I don't think so. The up and running docs are very straight forward: https://artifacthub.io/packages/helm/dask/daskhub?modal=template&amp;template=dask-kubernetes-rbac.yaml
Just copy/paste into config.yaml and run the helm command.
The schedulers repeat this inbetween a bunch of health checks:
Copy code
W1026 20:51:15.186967       1 reflector.go:324] <http://k8s.io/client-go/informers/factory.go:134|k8s.io/client-go/informers/factory.go:134>: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E1026 20:51:15.186984       1 reflector.go:138] <http://k8s.io/client-go/informers/factory.go:134|k8s.io/client-go/informers/factory.go:134>: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
More specifically:
Copy code
I1026 20:43:01.719126       1 eventhandlers.go:118] "Add event for unscheduled pod" pod="dhub/jupyter-admin"
I1026 20:43:14.922133       1 reflector.go:536] <http://k8s.io/client-go/informers/factory.go:134|k8s.io/client-go/informers/factory.go:134>: Watch close - *v1.PersistentVolumeClaim total 0 items received
I1026 20:43:55.230428       1 reflector.go:255] Listing and watching *v1beta1.CSIStorageCapacity from <http://k8s.io/client-go/informers/factory.go:134|k8s.io/client-go/informers/factory.go:134>
W1026 20:43:55.231755       1 reflector.go:324] <http://k8s.io/client-go/informers/factory.go:134|k8s.io/client-go/informers/factory.go:134>: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E1026 20:43:55.231772       1 reflector.go:138] <http://k8s.io/client-go/informers/factory.go:134|k8s.io/client-go/informers/factory.go:134>: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
c
I suspect that the contoller does not support Kubernetes 1.27+ ?
You might try this again on 1.26, or use an updated version of the chart that supports 1.27
b
Can I just do:
Copy code
k3s-uninstall.sh
and then do:
Copy code
curl -sfL <https://get.k3s.io> | INSTALL_K3S_VERSION=1.26 sh -
And then redeploy the chart?
c
yep
well no
b
Or do I have to back out of the chart first.
c
that will get you the latest 1.26 release
uninstall and reinstall will wipe everything, no need to clean out the current cluster
b
That was it!
It works now.
c
cool
b
Thank you so much!
c
does the chart itself have any notes about kubernetes version compatibility that might have saved some time?
b
I install the chart with:
Copy code
helm upgrade --wait --install --render-subchart-notes \
    dhub dask/daskhub \
    --namespace=dhub \
    --values=config.yaml
So I never see it.
How do I see it?
Copy code
$ helm show chart dask/daskhub
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/ubuntu/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/ubuntu/.kube/config
apiVersion: v2
appVersion: jh2.0.0-dg2023.1.1
dependencies:
- import-values:
  - child: rbac
    parent: rbac
  name: jupyterhub
  repository: <https://jupyterhub.github.io/helm-chart/>
  version: 2.0.0
- name: dask-gateway
  repository: <https://helm.dask.org/>
  version: 2023.1.1
description: Multi-user JupyterHub and Dask deployment.
icon: <https://avatars3.githubusercontent.com/u/17131925?v=3&s=200>
maintainers:
- email: <mailto:jtomlinson@nvidia.com|jtomlinson@nvidia.com>
  name: Jacob Tomlinson (Nvidia)
- email: <mailto:jhamman@ucar.edu|jhamman@ucar.edu>
  name: Joe Hamman (NCAR)
- email: <mailto:erik@sundellopensource.se|erik@sundellopensource.se>
  name: Erik Sundell
- email: <mailto:tom.w.augspurger@gmail.com|tom.w.augspurger@gmail.com>
  name: Tom Augspurger
name: daskhub
version: 2023.1.0
lol ignore those warnings. :3
c
yeah it’d be nice if their docs listed supported Kubernetes versions anywhere but I”m not seeing it
oh but also you installed version 2.0.0 that is a year old
so, no surprise that it only supports old versions of Kubernetes
b
I expected not specifying aversion to give me latest. 😞
c
Why are you installing an ancient version?
oh that’s not where you got it from
you got it from
<https://helm.dask.org/>
which does not appear to be the official chart and seems to be woefully out of date
b
Well now I'm just more confused.
c
or wait no, that’s the dask subchart
You installed “Multi-user JupyterHub and Dask deployment” chart which appears to have very old versions of both JupyterHub and Dask
b
Yeah.
c
Is that something you needed, or did you just want JH?
anyways I think thats why you’re stuck on old versions, whoever is maintaining your chart isn’t keeping up with either of the projects that they’re bundling
b
My objective is “Multi-user JupyterHub and Dask deployment on k3s"
Which this meets as not-ideal as it is.
So thanks again a ton for all the help!