Flaky DNS resolution diagnosis

Hi

I suspect there's a DNS resolution issue on my Linode causing it to be a bit flaky sometimes. I suspect this as I have Webmin checking if a couple of external web sites like Google are functioning… one testing with a direct IP and another with a DNS name. The DNS name one sometimes fails while the direct IP never fails.

Does anyone have an idea how I could diagnose this?

I did try mtr for a few hours and had no issues, which I know isn't related to DNS but at least irons out that there aren't intermittent network issues.

6 Replies

@jonny5alive:

Does anyone have an idea how I could diagnose this?
What sort of errors do you get when the webmin check fails by name (e.g., does it point to not being able to resolve the name)?

When I was having some issues in the past with the local resolvers in one of the DCs, I used a script to poll the local resolvers (with dig) for a known name. After a day or so it was easy enough to see periods of time when neither resolver was answering the request, and being able to quote those logs was helpful in submitting the trouble ticket.

Alternatively, you could switch your Linode to use a public DNS resolver for a period of time (say Google's) and see if the behavior changes and then just open a more general ticket asking for any known issues with the local resolvers.

Or, if your current testing is clearly showing name resolution errors, you could just open a ticket anyway. Best case it's something Linode is already aware of, or worst case you just fall back on one of the prior approaches to gather more data.

– David

Hey db3l

The webmin check is UP or DOWN, and the DNS one goes DOWN while the direct IP stays UP.

What script did you use with dig?

My tests aren't clear yet so hard to do a ticket.

@jonny5alive:

What script did you use with dig?
Just a couple of lines of a bash script I wrote for the occasion. I doubt I kept it, but it would have just been an infinite while loop with two dig calls (one for each resolver) and a sleep. Oh, and probably a date in there somewhere for logging, so probably something like (untested):

#!/bin/bash

while [ 1 ]; do
   date
   # One of the following for each resolver address
   echo "x.y.z.w:" `dig +noall +answer @x.y.z.w domain`
   sleep 30
done

Replace addresses, domain name and sleep delay as needed. In my case I used a domain of mine hosted on Linode's servers, to minimize the risk of introducing a remote DNS server issue into the testing. Then, nohup that into the background while redirecting the output to some file, e.g.:

nohup script >script.out 2>&1

wait a bit and then review the output looking for dig errors.

– David

So far I've found:

  • A very simple CURL script in PHP served by Apache won't resolve a DNS name, while when run as CLI with Apache user it works fine

  • No problems with DNS resolution when using dig, ping

  • Tried using Google's DNS servers in resolv.conf

  • Disabled IPv6

  • Disabled firewall

  • Only works when hardwiring DNS->IP address with /etc/hosts

At a loss to understand what is wrong. Looks like I will have to build a new Linode and set up everything again :(

Thanks everyone in IRC today.

Seems like I fell foul of this unresolved bug:

https://bugs.php.net/bug.php?id=11058

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct