So I did follow all the instructions step by step in my already deployed EKS, I did add the CSI driver, the AWS controller and all of that. But then every other thing I tried was actually problematic.
I did try to install Helm it kept failing complaining about some values for Nginx, so I learnt the hard way that I had to add ‘progressDeadlineSeconds’. After a lot of back and forth, managed to install it.
Then I went to the ‘deploy app’ section, and almost nothing worked because the tutorial assumes all the security groups are already set and perfect ! for instance we have to do a hairpin NAT, but there is no mention of that in tutorial, now I am stuck on Okteto build because it wants to go through TCP!
okteto build ! Insecure mode enabled i Using okteto-admin @ okteto.okteto.scholarspark.ai as context i Building ‘/Users/pouyaataei/Desktop/projects/okteto-demo-project/getting-started/api/Dockerfile’ in tcp://buildkit.okteto.scholarspark.ai:443… x Error building service ‘api’: buildkit service not available for 10m0s
Is the documentation really lacking ? I am just 3 days deep in, and I really cannot believe how painful it is so far. Am I missing something here ?
Btw the versions I am using match the ones stated in the docs.
Im so sorry that you are having a hard time installing Okteto. It seems that your cluster has some restrictions not considered in the docs, as we have installed several times following docs without any problems.
But let us help you having a functional installation understanding your infrastructure and then we can evaluate adding more information in the docs for the future installations with your case.
Can you provide us the AWS components you are using? Also, without sensitive information the helm values you are using for Okteto installation?
From what you mention what I could see is that adding progressDeadlineSeconds could be due an admission control/policy issue on the cluster (OPA/Gatekeeper, Kyverno, or custom Pod Security/PodDefaults) that requires certain fields in Deployments. The upstream NGINX chart doesn’t require that by default.
It could be that your cluster is stricter than Vanilla.
Regarding the Buildkit issue, could you check that the pod is healthy first? If not, maybe that can give us a hint with the error.
Regarding the NAT thing, how are you accessing the platform?
I do not have any policy manager installed like OPA, Gatekeeper or anything else. In regards to Buildkit ? you mean the pod in the Okteto namespace or Okteto-admin namespace ? there are many pods and services in each namespace.
I am accessing the platform in the EXACT way discussed in the documentation, nothing special really, just setup my DNS record to NLB’s address and then tried to access the admin dashboard.
Looking into your support bundle, we found this error on cluster-resources/events/okteto.json
0/5 nodes are available: 3 Too many pods, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
Based on this, Buildkit pods are not able to be scheduled due to lack of resources in your cluster. Do you have enough node capacity? Also review the max number of pods per node as you might be hitting this limit.
The following commands can help us get a better understanding of state of the buildkit pods:
# your pod name might be different
kubectl get pod -n=okteto -oyaml okteto-buildkit-7c80387317-0
We also recommend you run the following command to understand the state of the rest of the Okteto components. Any component on a failed or pending status should be investigated.
kubectl get pod -n=okteto
Are you installing in a fresh cluster? Which exact guide were you following?
The Okteto is installed on a test cluster which only had an API Gateway controller installed using Envoy, given that you guys are using the traditional ingress and have the AWS controller, there should not be any problem at all.
Plus the documentation itself says that if Okteto is getting installed on an existing cluster, we should just make sure the CSI driver and controller is installed and a few other things are in place.
It would be really unrealistic for a documentation to assume that people are gonna spawn up a complete new cluster just for a test tool, specially in our case, we were testing several other tools.
Yea, we did fix that since it took you around 2-3 days to get back to us, I am still really baffled by the fact that there is no mention of security groups in the docs for EKS, I am not certain how does a node receives traffic if the security group is not set, we applied a massive YAML that was very generous when it comes to giving access, but the networking issues remained.
There were lots of issues with the NLB too, we were not certain why you guys have not adopted the gateway API yet and why there are around 20 objects just to run a testing tool.
Given all, we will not continue this experiment, thanks for your time and thanks for trying to help.