Helm install fails. "buildkit cert is not available"

Hello. I had a functioning install of Okteto self-hosted, running in a 1.26 EKS cluster. I tore down and rebuilt the cluster today, and attempted to reinstall Okteto using the same configuration as before (using Helm via Terraform), but the Helm installation doesn’t complete. All pods seem to come up successfully, but I see jobs named “okteto-installer-checker-xxxxxx” running repeatedly. The logs of each job look like this:

{"timestamp":"2023-07-13T04:40:00Z","message":"'OKTETO_INGRESS_INTERNAL_IP' environment variable is not available","severity":"WARNING","context":{"data":{"instance.id":"okteto-installer-checker-28153720-2qf62"}}}
{"timestamp":"2023-07-13T04:40:00Z","message":"'OKTETO_BUILDKIT_INTERNAL_IP' environment variable is not available","severity":"WARNING","context":{"data":{"instance.id":"okteto-installer-checker-28153720-2qf62"}}}
{"timestamp":"2023-07-13T04:40:00Z","message":"loading buildkit certificate","severity":"INFO","context":{"data":{"instance.id":"okteto-installer-checker-28153720-2qf62"}}}
{"timestamp":"2023-07-13T04:40:00Z","message":"buildkit cert is not available yet, retrying","severity":"INFO","context":{"data":{"error":{},"instance.id":"okteto-installer-checker-28153720-2qf62"}}}
{"timestamp":"2023-07-13T04:40:00Z","message":"initialized kubernetes client with InClusterConfig","severity":"INFO","context":{"data":{"instance.id":"okteto-installer-checker-28153720-2qf62"}}}
{"timestamp":"2023-07-13T04:40:00Z","message":"retrieved k8s control plane version \"v1.26.0\"","severity":"INFO","context":{"data":{"instance.id":"okteto-installer-checker-28153720-2qf62"}}}

In particular, the line stating “buildkit cert is not available yet” jumps out at me, but I’m not sure how to troubleshoot it. Any ideas? Thanks.

Hi @dmccorma ,

What Okteto version are you installing? Is it the same you were installing before? Are you installing it also in 1.26 EKS?

Those jobs are not related to the installation. That is a cronjob to check internal installer jobs when development environments are being deployed. That message ("buildkit cert is not available yet) is not a blocking error for those components, so it shouldn’t cause the issue you are mentioning.

Do you get any error or just a timeout waiting to complete the installation? Could you execute Helm with the --debug in Terraform?

One of the things that might be happenning is that the ingress-controller service is not getting an external IP, could you verify it with kubectl get service -n okteto and checking the one ending with ingress-nginx-controller?

Aha. Thanks for pointing me in the right direction. The issue ended up not being related to Okteto itself. You’re right, the LoadBalancer services were not getting a public IP, which I tracked down to a configuration issue with the AWS Load Balancer Controller (I had not updated the tags on my VPC subnets to reflect the new cluster name, so the controller was not able to discover them). Upon correcting the tagging issue, the Load Balancer Controller was able to provision ALBs for the LoadBalancer services, and the Helm Chart completed installation successfully. Thanks!

Glad to hear you were able to solve the issue!

Happy coding!