Troubleshooting install - okteto-buildkit-xxx pod stuck in pending

I’m seeing a couple of issues after running through the okteto self-hosted install.

  1. okteto-registry-xxx pod is in CrashLoopBackOff. here is a log:
│ Events:                                                                                                                                                                      │
│   Type     Reason             Age                    From               Message                                                                                              │
│   ----     ------             ----                   ----               -------                                                                                              │
│   Normal   Scheduled          10m                    default-scheduler  Successfully assigned okteto/okteto-registry-64d597d87b-6h7bg to ip-192-168-38-69.us-west-2.compute. │
│ internal                                                                                                                                                                     │
│   Normal   Pulling            10m                    kubelet            Pulling image "okteto/registry:b5ca020"                                                              │
│   Normal   Pulling            10m                    kubelet            Pulling image "okteto/registry:b5ca020"                                                              │
│   Normal   Pulled             10m                    kubelet            Successfully pulled image "okteto/registry:b5ca020" in 3.313779269s                                  │
│   Normal   Killing            9m34s                  kubelet            Container registry failed liveness probe, will be restarted                                          │
│   Normal   Created            9m4s (x2 over 10m)     kubelet            Created container registry                                                                           │
│   Normal   Started            9m4s (x2 over 10m)     kubelet            Started container registry                                                                           │
│   Warning  FailedPreStopHook  9m4s                   kubelet            Exec lifecycle hook ([/bin/sleep 45]) for Container "registry" in Pod "okteto-registry-64d597d87b-6h │
│ 7bg_okteto(6ac0f1db-d40f-403d-ae8e-41987566ae77)" failed - error: command '/bin/sleep 45' exited with 137: , message: ""                                                     │
│   Normal   Pulled             9m4s                   kubelet            Container image "okteto/registry:b5ca020" already present on machine                                 │
│   Warning  Unhealthy          8m14s (x5 over 9m54s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 503                                        │
│   Warning  Unhealthy          40s (x51 over 10m)     kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503      
  1. okteto-buildkit-xxx-0 is stuck in pending status, here is the event in the log:
│ Events:                                                                                                                                                                      │
│   Type     Reason            Age    From               Message                                                                                                               │
│   ----     ------            ----   ----               -------                                                                                                               │
│   Warning  FailedScheduling  2m10s  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition    
  1. storage-okteto-buildkit-e142dcd56f-0 is stuck in status pending. logs:
   Type    Reason                Age                   From                         Message                                                                                   │
│   ----    ------                ----                  ----                         -------                                                                                   │
│   Normal  WaitForFirstConsumer  12m                   persistentvolume-controller  waiting for first consumer to be created before binding                                   │
│   Normal  ExternalProvisioning  2m40s (x42 over 12m)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or m │
│ anually created by system administrator                                                                                                                                      │
│                                             

feels like maybe the s3 bucket isn’t configured correctly?

Looks like I needed to install the csi driver. This feels like an amazon EKS thing. I just need to sharpen up my k8s-fu here.

Howdy, I can confirm that you are likely on the correct path that this is an ebs csi issue. You can check the pvc that buildkit is trying to use with kubectl get pvc -nokteto and also with kubectl describe pvc <pvc-name> should give some details. This is required by default it seems since kubernetes 1.23 on EKS. Amazon EBS CSI driver - Amazon EKS has some more info with a couple of install options listed. Reach out if you hit any further issues.

Thanks Jacob :slight_smile: I ended up getting it all working. Honestly I can’t even remember what I did. CSI driver, and then an eksctl command to set the service account correctly. Thanks for following up!