Multiple K8s Issues relating to Webhooks, Logs & I/O Timeouts
I have a really weird issue with one of my Linode K8s clusters running 1.23, there are multiple issues occuring and I can't quite pinpoint the root cause.
Linode have let me know it is not a issue with the master and nothing on there end, let me highlight all the identified problems to start.
Logs not Working
When trying to pull logs from any pods I get this error (which makes it very hard to troubleshoot)
root@aidan:~# kubectl logs <pod-name> -n revwhois-subdomain-enum Error from server: Get "https://192.168.150.102:10250/containerLogs/revwhois-subdomain-enum/tldbrr-revwhois-worker12-twppv/tldbrr-revwhois-worker12": dial tcp 192.168.150.102:10250: i/o timeout
Metrics not Working
root@m0chan:~# kubectl top nodes Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io) Pod Deletion not Working
When deleting a pod with kubectl delete pod <pod-name> - <namespace> - it will delete the pod however it is stuck in a terminating state, the old pod is not deleted and anew pod is not launched.</namespace></pod-name>
Errors Editing Ingress
Error from server (InternalError): error when creating "yaml/xxx/xxx-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: Temporary Redirect
I also have errors on Metrics logs and Cert-Manager logs relating to failed calling webhook
This is all for now and I would really appreciate some help resolving this.
If you are unable to pull logs from your pods, that certainly speaks to an issue with the health of your cluster. You said,
"Linode have let me know it is not a issue with the master and nothing on there end, let me highlight all the identified problems to start."
I suggest reaching back out the Support Team to let them know the LKE cluster isn't functioning as expected. Generally speaking, they can take another look and escalate the issue to the Administrators.