CoreDNS configured incorrectly in LKE beta?

I've been bashing my head on a number of issues trying out the LKE beta. It seems like DNS is once again the root cause… twice… once being my fault and the second time (the point of this question) looks like it may be a configuration issue in LKE

As an aside, it would be great if you could publish a guide on setting up LKE with an ingress (e.g. nginx-ingress or traefik), external-dns so that you can configure the external hostnames and certmanager with ACME via LetsEncrypt. This would seem to be a basic minimum that most people using LKE for hosting would want. (Also is there anything you can do to speed up your hosted DNS update… 30 minutes is a long time to wait to see a new hostname)

Anyway, after much diagnosis (and finally trying helm upgrade --namespace nginx nginx stable/nginx-ingress --set controller.publishService.enabled=true) I was able to get the Ingress external IP to be that of the LoadBalancer assigned to the nginx-ingress and that unblocked the biggest stumbling block I had with cert-manager… namely that it was failing the self-check…

Status:
  Presented:   true
  Processing:  true
  Reason:      Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://...redacted.../.well-known/acme-challenge/bD...redacted...j0': Get http://...redacted.../.well-known/acme-challenge/bD...redacted...j0: dial tcp: lookup ...redacted... on 10.128.0.10:53: no such host
  State:       pending

Given that the external-dns had been pointing the hostname at the Nodes rather than the LoadBalancer, I had believed the issue to be DNS… (plus it's always DNS)

When I configured Nginx Ingress to publish the service end-point the DNS entries were changed to the correct IP… of course then I have to wait for Linode's DNS servers to update (30 minutes WAT!)

Same error!!!

So I say, let's go digging DNS:

apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always

followed by a kubectl apply and then:

$ kubectl exec -ti dnsutils -- nslookup kubernetes.default
Server:        10.128.0.10
Address:    10.128.0.10#53

Name:    kubernetes.default.svc.cluster.local
Address: 10.128.0.1

That looks ok… This however…

$ kubectl exec -ti dnsutils -- nslookup ...redacted...
Server:        10.128.0.10
Address:    10.128.0.10#53

** server can't find ...redacted...: NXDOMAIN

command terminated with exit code 1

That's not great…

So I tried to brute force it and changes the CoreDNS configmap from

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap

To

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap

Restarted the coredns pods and w00t DNS was working… but this feels wrong.

Why was DNS not working with forward . /etc/resolv.conf?

(I have the history of all the setup commands I have run on the cluster as I was recording them as notes… but I'm not comfortable making those notes public)

LKE using Kubernetes 1.17

4 Replies

Hello stephenc,

Thank you for reaching out to us about this and for providing the output. I think the best course of action here is to escalate this to members of our LKE team. I went ahead and brought your query to their attention. You'll receive a follow-up response once we've determined the root cause.

We'll be in touch asap.

Kind Regards,
Christopher

Hi @stephenc!

Can I confirm that you are using the External DNS controller (https://github.com/kubernetes-sigs/external-dns) configured to use Linode DNS?

Currently, I see no configuration issues with CoreDNS

/ # dig duckduckgo.com

; <<>> DiG 9.14.8 <<>> duckduckgo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22397
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;duckduckgo.com.            IN  A

;; ANSWER SECTION:
duckduckgo.com.        30  IN  A   54.241.2.241

;; Query time: 1 msec
;; SERVER: 10.128.0.10#53(10.128.0.10)
;; WHEN: Thu Mar 05 18:36:44 UTC 2020
;; MSG SIZE  rcvd: 73

The above DNS resolution was carried out inside an Alpine container running on an LKE cluster.

You can reproduce this with

$ kubectl run alpine-foo -ti --image=alpine --restart=Never /bin/sh
# apk update
# apk add bind-tools
# dig duckduckgo.com

Note that the DNS server used is CoreDNS, 10.128.0.10

I ran into a similar situation with cert-manager (the configuration that comes default with the gitlab helm docs) - logs indicated the pod couldn't resolve the name I'd configured external-dns to create in Linode DNS. The record was indeed created, I could resolve from my connection, but the cluster couldn't (waited 1 hour). Replaced the CoreDNS forward directive as OP, the certificates almost immediately verified.

Also k8s v1.17, coredns 1.6.5

This thread saved me a bunch of time. Thanks OP! o/

p.s. I've got a full helm-ified traefik 2 ingress using ingressRoute CRDs + letsencrypt configuration I'd be happy to share. Nice little ghetto way to manage everything through minimum load balancers. I figure this isn't the place to post that, hit me up if you're interested.

If somebody else stumbles into this: CoreDNS is configured to be use Linode nameservers in its resolv.conf.

The default TTL for those is 24 hours.

You can specify a shorter TTL for the external-dns records by adding the external-dns.alpha.kubernetes.io/ttl annotation to your services / ingress.

See https://github.com/kubernetes-sigs/external-dns/blob/master/docs/ttl.md

That was enough for me.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct