ndots:5 and the DNS tax in Kubernetes

If you have ever looked at /etc/resolv.conf in a Kubernetes pod, you have seen options ndots:5. It is one of those things that is correct for the cluster’s own DNS model and actively hostile to anything resolving external names. We hit this when an application was making external HTTPS calls and our P99 connect latency was atrocious.

What ndots does

ndots:5 means “if the name you’re resolving has fewer than 5 dots, try it with each of the search domains first before treating it as an absolute name”. On a pod, the search path is typically:

search mynamespace.svc.cluster.local svc.cluster.local cluster.local example.internal
options ndots:5 edns0 trust-ad

So if your code resolves api.stripe.com, which has 2 dots, the resolver first tries:

api.stripe.com.mynamespace.svc.cluster.local
api.stripe.com.svc.cluster.local
api.stripe.com.cluster.local
api.stripe.com.example.internal

All four return NXDOMAIN. Then it tries api.stripe.com and finally gets the real answer. That is five DNS round trips where one would suffice.

What we saw

Our service was slow to call an external API. curl -w '%{time_namelookup}\n' from inside a pod:

curl -w 'lookup=%{time_namelookup}s connect=%{time_connect}s total=%{time_total}s\n' \
    -o /dev/null -s https://api.example-partner.com/health
# lookup=0.138s connect=0.156s total=0.287s

138 ms just for DNS. From outside the cluster, same query:

dig api.example-partner.com +short
# ... 14 ms ...

So 120 ms of Kubernetes DNS tax. On a service that fires this call on every request, we were paying dearly.

The tax is visible in CoreDNS metrics

If you run CoreDNS with prometheus metrics, this shows up obviously:

curl -s http://coredns:9153/metrics | grep coredns_forward_request_count
# coredns_forward_request_count_total{to="kube-dns",rcode="NXDOMAIN"} 2841721
# coredns_forward_request_count_total{to="kube-dns",rcode="NOERROR"} 812409

A 3.5:1 ratio of NXDOMAIN to NOERROR is diagnostic. Healthy clusters have more NOERROR than NXDOMAIN; ours had more NXDOMAIN because of all the search-path expansions failing.

The fixes

There are several. We used a combination.

Per-pod dnsConfig

You can override ndots at the pod level:

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "1"

For pods that almost always do external lookups, this is perfect. It makes any name with a dot absolute. It also breaks api resolving to api.mynamespace.svc.cluster.local via search path, but that is usually fine because we use FQDNs internally.

Use FQDNs in code

If your code is doing http.Get("https://api.stripe.com"), that hostname has 2 dots, which is fewer than 5, so the search path fires. If your code does http.Get("https://api.stripe.com.") with a trailing dot, Go (and most HTTP clients) will treat it as absolute. Some libraries strip the trailing dot. Results vary.

Use a local resolver cache

The real Kubernetes pattern for DNS at scale is NodeLocal DNSCache. It runs a DNS cache on every node and short-circuits the search path dance. We already had this deployed, and it helped, but the search path still fired; the cache just made each failed lookup fast. The combined result was NodeLocal + ndots:2 for most apps. External calls got a ~15 ms lookup time on cache miss, and sub-ms on cache hit.

Use a dedicated Service for hot external endpoints

For one service that called the same external partner 100 times per second, we created an ExternalName Service:

apiVersion: v1
kind: Service
metadata:
  name: partner-api
  namespace: app
spec:
  type: ExternalName
  externalName: api.example-partner.com.

Code calls partner-api.app.svc.cluster.local, which resolves to a CNAME to api.example-partner.com, which resolves to an IP, all cached locally. We also got the side benefit of centralizing the partner name; if the partner ever renames their API we change one yaml.

Measuring the improvement

After we rolled out ndots:2 to the worst offenders, the CoreDNS NXDOMAIN rate dropped by 80% cluster-wide. P99 latency on the affected service went from 680 ms to 420 ms. Simple win.

kubectl logs -n kube-system coredns-... | grep -c NXDOMAIN
# 2841721 -> 540311 over equivalent window

Reflection

The default ndots:5 exists because Kubernetes wants kubectl exec -- curl http://my-service to work, where my-service has zero dots and the search path is the whole story. If your workloads talk to the outside world a lot, you are paying for a default that was not designed for you.

The simplest first-pass fix is dnsConfig.options.ndots: 2 on the Deployments that do external calls. If you haven’t deployed NodeLocal DNSCache, do that next. If you’re still feeling pain, ExternalName Services or caching in the application are options. Related: see my post on Debugging DNS in a kind cluster for the local-cluster version of this pain.