Hi! Been a minute. We somehow horked our self-hosted installation during an upgrade, and now we’re seeing some errors like this in our preview environments:
container 'migrate' image 'registry.dev.upshift.earth/pr-81/fuji-service-fuji-service:okteto' not found or it is private and 'imagePullSecrets' is not properly configured.
Curiously, the dev envs work okay. I don’t think it’s an issue with pull secrets - we’re just using the standard okteto registry inside the namespace for the preview env.
another error:
2023-07-29 17:08:48.00 UTCfuji-service-6b655f4c94-kct4z[pod-event]Error: ImagePullBackOff
2023-07-29 17:22:17.00 UTCfuji-service-6bd9d5d58f-wwbf6[pod-event]Pulling image "registry.dev.upshift.earth/pr-81/fuji-service-fuji-service:okteto"
2023-07-29 17:22:17.00 UTCfuji-service-6bd9d5d58f-wwbf6[pod-event]Failed to pull image "registry.dev.upshift.earth/pr-81/fuji-service-fuji-service:okteto": rpc error: code = NotFound desc = failed to pull and unpack image "registry.dev.upshift.earth/pr-81/fuji-service-fuji-service:okteto": failed to resolve reference "registry.dev.upshift.earth/pr-81/fuji-service-fuji-service:okteto": registry.dev.upshift.earth/pr-81/fuji-service-fuji-service:okteto: not found
Is there a way we can list the catalogs? My k8s fu just isn’t quite what I wish it were to be able to suss this out.
Yes, I don’t think this is an issue with secrets. In the latest versions of Okteto, we introduced a cache to improve the build and image pull speed. I think you might be hitting something related to how we internally name images.
No worries about the catalog cmd. That would be useful, but I’m sure we can muddle through.
Here is the manifest:
❯ cat okteto.yml
name: fuji-service
# The build section defines how to build the images of your development environment
# More info: https://www.okteto.com/docs/reference/manifest/#build
build:
# You can use the following env vars to refer to this image in your deploy commands:
# - OKTETO_BUILD_HELLO_ROCKET_REGISTRY: image registry
# - OKTETO_BUILD_HELLO_ROCKET_REPOSITORY: image repo
# - OKTETO_BUILD_HELLO_ROCKET_IMAGE: image name
# - OKTETO_BUILD_HELLO_ROCKET_SHA: image tag sha256
fuji-service:
context: .
dockerfile: Dockerfile
# The deploy section defines how to deploy your development environment
# More info: https://www.okteto.com/docs/reference/manifest/#deploy
deploy:
- name: Deploy the helms
# if SYSTEM_TOKEN is supplied, we assume we have other preview env vars for now.
command: |
if [ -f ".env" ]; then
source .env
helm upgrade --install chart ./helm \
--set db_url="${DATABASE_URL}" \
--set oso_api_key="${OSO_API_KEY}"
else
helm upgrade --install chart ./helm \
--set db_url="postgresql://fuji@db:26257/defaultdb" \
--set oso_api_key="${OSO_API_KEY}" \
--set cockroachdb.enabled=true
fi
# The dependencies section defines other git repositories to be deployed as part of your development environment
# More info: https://www.okteto.com/docs/reference/manifest/#dependencies
# dependencies:
# - https://github.com/okteto/sample
# The dev section defines how to activate a development container
# More info: https://www.okteto.com/docs/reference/manifest/#dev
dev:
fuji-service:
selector:
app: fuji-service
command: bash
workdir: /usr/src/app
sync:
- .:/usr/src/app
volumes:
- /usr/local/cargo/registry
- /home/root/app/target
persistentVolume:
enabled: true
#storageClass: okteto-standard
size: 10Gi
Another thing that has seemed to stop working is that I can no longer build from local:
❯ okteto build
i Building 'Dockerfile' in tcp://buildkit.dev.upshift.earth:443...
[+] Building 0.0s (0/0)
x Error building service 'fuji-service': error building image 'registry.dev.upshift.earth/bennidhamma/fuji-service-fuji-service:okteto': build failed: failed to get status: rpc error: code = Unavailable desc = connection closed
Looking at the last error you showed, it seems like buildkit stopped functioning after the upgrade. Could you start a separate topic for this? It’ll be useful if you mention which version you upgrade from and to, the error you are seeing on the building pods, and your values.yaml (the one used to install Okteto)
It does seem to build though, from inside of okteto (see attached) I just can’t seem to connect to buildkit from with my okteto context on my dev machine… ?
Note: I fixed the buildkit issue by adding a separate route 53 entry for buildkit. seems like maybe some versions of okteto use the shared NLB, and some put it on a different load balancer maybe? I forget…
I have a some questions to understand better the issue.
Related to the pull error you are getting, how do you refer to the image generated in the build phase within your helm manifest? I don’t see that it’s been set in the deploy command so I assume is somehow harcoded in the helm manifest of your application?
Related to BuildKit error. Was BuildKit running ok before the upgrade to 1.10 or you were getting the connection closed ones too?
With that change, BuildKit starts to be exposed in its own LoadBalancer instead of behind Okteto’s main ingress controller. That’s why you needed to add the separated route 53 entry for BuildKit. Reading your previous messages is not clear to me if the BuildKit error started after the upgrade to 1.10 or after this change in the chart values, so I would like to understand better the issue and the timeline to help you.
If you are running a NLB for Okteto ingress controller, we recommend to run BuildKit in its own LoadBalancer. In this post there is an explanation on how to configure it, which is mainly what you already did.
Regarding the buildkit values, I’m not sure when they were added. We’ve been using okteto for the last 6 months or so, but there’s been a few of us jumping in an changing things / fixing it / upgrading it. My apologies. But, it does seem like buildkit is working okay now. It was definitely working before the 1.10 upgrade, then I suspect I horked something when I was reinstalling the okteto chart, I ended up mashing up some route 53 settings. Not my forte lol.
Regarding the issue with the images to be pulled. The error you are getting is because the cache system mentioned by Ramiro. When an image is built and the working directory is clean, images are pushed to <registry-url>/okteto/<image-name> (the shortcut is okteto.global/<image>) instead of <registry-url>/<namespace>/<image-name> (the shortcut is okteto.dev/<image>). The reason because of that is because the image pushed to okteto.global is accessible by all the users, so they don’t have to build it again.
So the issues seems to be that image is being pushed to okteto.global but then, the manifest is pulling from okteto.dev and that doesn’t exist. You can check it checking the image pushed. The recommended way to achieve what you want is to do something similar to what we have in our movies example. You should use the environment variables we generate OKTETO_BUILD_<service-name>_XXXX and pass it to helm. So it will pass the right image built during the process.
On the other hand, related to BuildKit issue. I wanted to know if, BuildKit stopped working because maybe in the upgrade to 1.10 you also changed the BuildKit values and as the route 53 entry was missing, that was expected. Or, on the contrary, BuildKit stopped working after the upgrade and the, you changed the config values to try to fix it. Anyways, we will try to reproduce the issue to see if during the upgrade something breaks BuildKit. Thank you so much for the information.
Let us know if you have any other question or the proposed solution doesn’t work for the image issue.