Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Fail to start jupyterhub during deployment #33

Open
yuanminhui opened this issue Nov 21, 2023 · 0 comments
Open

bug: Fail to start jupyterhub during deployment #33

yuanminhui opened this issue Nov 21, 2023 · 0 comments

Comments

@yuanminhui
Copy link
Contributor

Describe the bug

Fail to start jupyterhub during deployment.

To Reproduce

Follow the deployment guide at https://bio-os.gitbook.io/userguide/bu-shu/getting-set-up/bu-shu-bioos or https://github.com/Bio-OS/helm-charts/blob/main/README.md.

$ helm install jupyterhub bioos/jupyterhub \
        --namespace bioos \
        --create-namespace \
        --set hub.db.url=mysql+pymysql://root:[email protected]:3306/bioos \
        --set hub.db.password=Bytedance2023
"bioos" has been added to your repositories
NAME: jupyterhub
LAST DEPLOYED: Sun Nov 19 10:37:46 2023
NAMESPACE: bioos
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
.      __                          __                  __  __          __
      / / __  __  ____    __  __  / /_  ___    _____  / / / / __  __  / /_
 __  / / / / / / / __ \  / / / / / __/ / _ \  / ___/ / /_/ / / / / / / __ \
/ /_/ / / /_/ / / /_/ / / /_/ / / /_  /  __/ / /    / __  / / /_/ / / /_/ /
\____/  \__,_/ / .___/  \__, /  \__/  \___/ /_/    /_/ /_/  \__,_/ /_.___/
              /_/      /____/

       You have successfully installed the official JupyterHub Helm chart!

### Installation info

  - Kubernetes namespace: bioos
  - Helm release name:    jupyterhub
  - Helm chart version:   2.0.0
  - JupyterHub version:   3.0.0
  - Hub pod packages:     See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/2.0.0/images/hub/requirements.txt

Nothing seems wrong here, and:

$ kubectl -n bioos port-forward --address 0.0.0.0 service/hub 8081:8081
error: unable to forward port because pod is not running. Current status=Pending

The jupyterhub pod is not running. A long way debug started.


$ kubectl -n bioos get pods -o wide
NAME                   READY   STATUS             RESTARTS     AGE   IP           NODE           NOMINATED NODE   READINESS GATES
hub-5f57d5bd65-wlw6r   0/1     CrashLoopBackOff   6 (4m ago)   10m   10.244.1.3   minikube-m02   <none>           <none>
mysql-0                1/1     Running            0            47m   10.244.3.3   minikube-m04   <none>           <none>

$ kubectl -n bioos get svc -o wide
NAME                             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE   SELECTOR
hub                              NodePort       10.255.159.107   <none>        8081:32450/TCP   34m   app=jupyterhub,component=hub,release=jupyterhub
jupyter--2fjupyterhub-2f-route   ExternalName   <none>           hub.bioos     8081/TCP         67m   <none>
mysql                            ClusterIP      10.255.80.216    <none>        3306/TCP         72m   app.kubernetes.io/component=primary,app.kubernetes.io/instance=mysql,app.kubernetes.io/name=mysql
mysql-headless                   ClusterIP      None             <none>        3306/TCP         72m   app.kubernetes.io/component=primary,app.kubernetes.io/instance=mysql,app.kubernetes.io/name=mysql

$ kubectl -n kube-system get pods,svc
NAME                                       READY   STATUS    RESTARTS         AGE
pod/coredns-7f74c56694-z87cm               1/1     Running   6 (3h40m ago)    25h
pod/csi-nfs-controller-74f4f8484-jmqnw     4/4     Running   0                3h25m
pod/csi-nfs-node-h97cs                     3/3     Running   0                3h25m
pod/csi-nfs-node-kg7j8                     3/3     Running   0                3h25m
pod/csi-nfs-node-rwbs4                     3/3     Running   0                3h25m
pod/csi-nfs-node-z4d89                     3/3     Running   0                3h25m
pod/etcd-minikube                          1/1     Running   2 (3h40m ago)    25h
pod/kube-apiserver-minikube                1/1     Running   2 (3h40m ago)    25h
pod/kube-controller-manager-minikube       1/1     Running   2 (3h40m ago)    25h
pod/kube-proxy-8dl6v                       1/1     Running   2 (3h33m ago)    25h
pod/kube-proxy-cx85g                       1/1     Running   2 (3h33m ago)    25h
pod/kube-proxy-nbvf4                       1/1     Running   2 (3h33m ago)    25h
pod/kube-proxy-sqcwc                       1/1     Running   2 (3h40m ago)    25h
pod/kube-scheduler-minikube                1/1     Running   2 (3h40m ago)    25h
pod/snapshot-controller-66746ffc86-r4w6k   1/1     Running   0                3h25m
pod/storage-provisioner                    1/1     Running   17 (3h40m ago)   25h

NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.255.0.10   <none>        53/UDP,53/TCP,9153/TCP   25h

$ kubectl -n ingress-nginx get pods,svc  -o wide
NAME                                            READY   STATUS              RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
pod/ingress-nginx-admission-create-z48gm        0/1     ImagePullBackOff    0          25h   10.244.0.8   minikube   <none>           <none>
pod/ingress-nginx-admission-patch-2mdgb         0/1     ImagePullBackOff    0          25h   10.244.0.9   minikube   <none>           <none>
pod/ingress-nginx-controller-684c54767f-gwtk9   0/1     ContainerCreating   0          25h   <none>       minikube   <none>           <none>

NAME                                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
service/ingress-nginx-controller             NodePort    10.255.48.183   <none>        80:30710/TCP,443:30659/TCP   25h   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP   10.255.61.48    <none>        443/TCP                      25h   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

$ kubectl get pods -n ingress-nginx
NAME                                        READY   STATUS              RESTARTS   AGE
ingress-nginx-admission-create-z48gm        0/1     ImagePullBackOff    0          26h
ingress-nginx-admission-patch-2mdgb         0/1     ErrImagePull        0          26h
ingress-nginx-controller-684c54767f-gwtk9   0/1     ContainerCreating   0          26h

To look deeper into the nginx pod:

$ kubectl describe pod ingress-nginx-admission-create-z48gm -n ingress-nginx
Name:         ingress-nginx-admission-create-z48gm
Namespace:    ingress-nginx
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Sat, 18 Nov 2023 09:51:46 +0800
Labels:       app.kubernetes.io/component=admission-webhook
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              controller-uid=e2b83263-5092-4c77-8296-b777ed5d9705
              job-name=ingress-nginx-admission-create
Annotations:  <none>
Status:       Pending
IP:           10.244.0.8
IPs:
  IP:           10.244.0.8
Controlled By:  Job/ingress-nginx-admission-create
Containers:
  create:
    Container ID:
    Image:         registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      create
      --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
      --namespace=$(POD_NAMESPACE)
      --secret-name=ingress-nginx-admission
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wgpk6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-wgpk6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
                             minikube.k8s.io/primary=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age                    From     Message
  ----    ------   ----                   ----     -------
  Normal  BackOff  4s (x1095 over 4h10m)  kubelet  Back-off pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80"

It seems the image is not pulled successfully. I tried to pull manully to validate:

$ docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80
Error response from daemon: manifest for registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80 not found: manifest unknown: manifest unknown

Manifest of the image signature failed. When I removed sha256 string:

$ docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0
v20231011-8b53cabe0: Pulling from google_containers/kube-webhook-certgen
07a64a71e011: Pulling fs layer
fe5ca62666f0: Pulling fs layer
b02a7525f878: Pulling fs layer
fcb6f6d2c998: Waiting
e8c73c638ae9: Waiting
1e3d9b7d1452: Waiting
4aa0ea1413d3: Waiting
7c881f9ab25e: Waiting
5627a970d25e: Waiting
2c4dd5b46232: Waiting

It works.

So the problem lies in image designation. A procedure is needed for manully recover the deployments. I will provide the fixed procedure in a PR so others can install this program successfully. Please check and merge it.

Expected behavior

Successful deployment of jupyterhub & bioos.

Screenshots

None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants