[Solved-Partially] Kubernetes HTTP01 Challenge Seperate Namespace
Hey folks, I am trying to get SSL working on my Kubernetes cluster.
*ingress controller is deployed on namespace default
*application is installed in namespace app01
*ingress object is deployed to namespace app01
*confirmed without tls related and cert manager http traffic works
relevant ingress portions
metadata: ... annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: "letsencrypt-prod" cert-manager.ioacme-challenge-type: http01 spec: tls: - hosts: - redact.com - redact.com secretName: redact-tls
However, when I do a describe challenge in the app01 namespace, I see they are all failing to perform HTTP-01 challenge propogation.
Thinking it's maybe similar to what I am seeing here: https://www.digitalocean.com/community/questions/how-do-i-correct-a-connection-timed-out-error-during-http-01-challenge-propagation-with-cert-manager
controller on default namespace
application on app01 namespace
ingress object with TLS info deployed to app01 namespace
http-01 challenge failing
I did some testing of this myself, but I was unable to recreate the connection timeout issues you're seeing. I configured my environment like this:
- My Nginx Ingress Controller was deployed in the default namespace
- I deployed the Nginx demo application included in the guide you mentioned to the app01 Namespace.
- I deployed the Ingress to the app01 Namespace.
I don't believe the Digital Ocean post is related, as there's no need to set a hostname on an LKE Cluster's NodeBalancer. If possible, it might be helpful to post some of the errors from your cert-manager logs to see if we can get a better idea of the root cause.
Linode Support Staff
Solved, partially (I don't have a clear answer as how to fix on the original approach).
So, root cause here was I was using Nginx's Ingress controller versus the Kubernetes maintained Nginx controller.
Kub's maintained version creates a default service that routes HTTP traffic appropriately to the challenges that are deployed. Nginx's version does not.
If I deploy out another cluster at some point, I'll look into this more, but I think what's happening is with Nginx controller, default backend is essentially HTTPS and there is something odd happening with cert manager causing the TLS handshake to fail on the get so certs are never generated.
I think something similar to what is being done on the ingress here would need to be done https://medium.com/containerum/how-to-launch-nginx-ingress-and-cert-manager-in-kubernetes-55b182a80c8f to handle the port 80 call.
Thanks @rl0nergan for the reply. Hopefully I am not doing anything stupid here (which I wouldn't put past me ;)). Let me know if there are other logs I can provide that may be helpful.
As a note, my domains are managed via Namecheap and not imported to the DNS manager on Linode (I am assuming this isn't an issue here). My A records are pointing the Node Balancer external IP. Additional note I am using latest cert bot referenced here https://cert-manager.io/docs/installation/kubernetes/ versus the 0.15 referenced on the Linode article.
From my understanding of the error, the challenge is issued via the pods and external traffic can't hit the temporary pod (so we fail and don't proceed to hit Letsencrypt's servers) [https://cert-manager.io/docs/faq/acme/]
kubectl get pods -n app01
NAME READY STATUS RESTARTS AGE app01core 1/1 Running 0 21h cm-acme-http-solver-29sqz 1/1 Running 0 18h cm-acme-http-solver-cgds7 1/1 Running 0 18h cm-acme-http-solver-lt7q7 1/1 Running 0 18h cm-acme-http-solver-zqd86 1/1 Running 0 18h
kubectl describe ingress -n app01
I noticed I have no nginx-ingress-default-backend on my default svc's. Installed nginx controller via https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/ versus the Kubernetes maintained Nginx Ingress (helm command was helm install nginx-ingress stable/nginx-ingress --set controller.publishService.enabled=true)… I am exploring this as I think this might be part of the root cause -- I saw some other website mentioning handshake issues for another problem that was caused by default backend coming in HTTPS or something like that.
... Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>) ...
kubectl describe challenges -n app01
... Reason: Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://redact/.well-known/acme-challenge/redact': Get "https://redact:443/.well-known/acme-challenge/redact": remote error: tls: handshake failure State: pending ...
kubectl logs cert-manager-5bc6c5cb94-22hfb -n cert-manager
E0909 01:12:54.634692 1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://www.redact.com/.well-known/acme-challenge/redact': Get \"https://www.redact.com:443/.well-known/acme-challenge/redact\": remote error: tls: handshake failure" "dnsName"="www.redact.com" "resource_kind"="Challenge" "resource_name"="app01-tls-7l84x-1528030512-2246324353" "resource_namespace"="app01" "resource_version"="v1" "type"="HTTP-01"
apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: name: app01 namespace: app01 annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: "letsencrypt-prod" cert-manager.io/acme-challenge-type: http01 spec: tls: - hosts: - redact.video - www.redact.video - redact.com - www.redact.com secretName: redact-tls rules: - host: redact.video http: paths: - backend: serviceName: app01-core servicePort: 8000 - host: www.redact.video http: paths: - backend: serviceName: app01-core servicePort: 8000 - host: redact.com http: paths: - backend: serviceName: app01-core servicePort: 8000 - host: www.redact.com http: paths: - backend: serviceName: app01-core servicePort: 8000
apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: email: [email protected] server: https://acme-v02.api.letsencrypt.org/directory privateKeySecretRef: name: letsencrypt-secret-prod solvers: - http01: ingress: class: nginx
and the actual pod being deployed to app01 (a containerized flask app with gunicorn)
apiVersion: v1 kind: Pod metadata: name: app01core namespace: app01 labels: app: app01core spec: containers: - name: main-app-container image: redact.azurecr.io/redact/core_app:latest imagePullPolicy: IfNotPresent env: - name: SECRET_KEY valueFrom: secretKeyRef: name: environment key: SECRET_KEY - name: RECAPTCHA_PUB valueFrom: secretKeyRef: name: environment key: RECAPTCHA_PUB - name: RECAPTCHA_PRV valueFrom: secretKeyRef: name: environment key: RECAPTCHA_PRV - name: SENDGRID_KEY valueFrom: secretKeyRef: name: environment key: SENDGRID_KEY - name: SENDGRID_SENDER valueFrom: secretKeyRef: name: environment key: SENDGRID_SENDER ports: - containerPort: 8000 imagePullSecrets: - name: acr-secret