I’m getting some strange occurrences with our self-hosted buildkit pod. On an attempt to build a specific namespace of ours, we are receiving the following error within buildkit:
time="2022-11-11T13:32:30Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = unexpected status: 500 Internal Server Error\n"
time="2022-11-11T13:55:56Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = unexpected status: 500 Internal Server Error\n"
time="2022-11-11T14:58:57Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = unexpected status: 500 Internal Server Error\n"
It’s fairly reproducible and seems to then break all other namespaces. After a quick restart of the pod other namespaces do work again.
Any ideas? Debugging steps?
Thanks!
NB: Still learning k8s so if you would like logs etc. please do be specific
If a restart solves it, it could mean that the pod is running out of space (on restart, the cache is cleaned). Would you mind sharing the logs from the buildkit pods, and your config.yaml file?
Hi @ramiro, thank you for getting back to me - I had previously run the above commands and couldn’t see anything too out of the ordinary. Unfortunately/fortunately the issue isn’t occurring anymore so I can’t get a log dump for you to use.
It appears one of our PRs was causing an outage - on a namespace rebuild it would cause the buildkit to freeze and then all other namespaces would be unable to build, considering a re-deployment of a pod fixed this it most likely was something along the lines of it running out of something (we have quite a restrictive environment with regards to resources).
Again, thanks for the support, and if we ever enter this state again I’ll get some logs for you!