ndots:5 and the DNS tax in Kubernetes
If you have ever looked at /etc/resolv.conf in a Kubernetes pod, you have seen options ndots:5. It is one of those things that is correct for the cluster’s own DNS model and actively hostile to anything resolving external names. We hit this when an application was making external HTTPS calls and our P99 connect latency was atrocious.
What ndots does
ndots:5 means “if the name you’re resolving has fewer than 5 dots, try it with each of the search domains first before treating it as an absolute name”. On a pod, the search path is typically:
search mynamespace.svc.cluster.local svc.cluster.local cluster.local example.internal
options ndots:5 edns0 trust-ad
So if your code resolves api.stripe.com, which has 2 dots, the resolver first tries:
api.stripe.com.mynamespace.svc.cluster.local
api.stripe.com.svc.cluster.local
api.stripe.com.cluster.local
api.stripe.com.example.internal
All four return NXDOMAIN. Then it tries api.stripe.com and finally gets the real answer. That is five DNS round trips where one would suffice.
What we saw
Our service was slow to call an external API. curl -w '%{time_namelookup}\n' from inside a pod:
curl -w 'lookup=%{time_namelookup}s connect=%{time_connect}s total=%{time_total}s\n' \
-o /dev/null -s https://api.example-partner.com/health
# lookup=0.138s connect=0.156s total=0.287s
138 ms just for DNS. From outside the cluster, same query:
dig api.example-partner.com +short
# ... 14 ms ...
So 120 ms of Kubernetes DNS tax. On a service that fires this call on every request, we were paying dearly.
The tax is visible in CoreDNS metrics
If you run CoreDNS with prometheus metrics, this shows up obviously:
curl -s http://coredns:9153/metrics | grep coredns_forward_request_count
# coredns_forward_request_count_total{to="kube-dns",rcode="NXDOMAIN"} 2841721
# coredns_forward_request_count_total{to="kube-dns",rcode="NOERROR"} 812409
A 3.5:1 ratio of NXDOMAIN to NOERROR is diagnostic. Healthy clusters have more NOERROR than NXDOMAIN; ours had more NXDOMAIN because of all the search-path expansions failing.
The fixes
There are several. We used a combination.
Per-pod dnsConfig
You can override ndots at the pod level:
spec:
dnsConfig:
options:
- name: ndots
value: "1"
For pods that almost always do external lookups, this is perfect. It makes any name with a dot absolute. It also breaks api resolving to api.mynamespace.svc.cluster.local via search path, but that is usually fine because we use FQDNs internally.
Use FQDNs in code
If your code is doing http.Get("https://api.stripe.com"), that hostname has 2 dots, which is fewer than 5, so the search path fires. If your code does http.Get("https://api.stripe.com.") with a trailing dot, Go (and most HTTP clients) will treat it as absolute. Some libraries strip the trailing dot. Results vary.
Use a local resolver cache
The real Kubernetes pattern for DNS at scale is NodeLocal DNSCache. It runs a DNS cache on every node and short-circuits the search path dance. We already had this deployed, and it helped, but the search path still fired; the cache just made each failed lookup fast. The combined result was NodeLocal + ndots:2 for most apps. External calls got a ~15 ms lookup time on cache miss, and sub-ms on cache hit.
Use a dedicated Service for hot external endpoints
For one service that called the same external partner 100 times per second, we created an ExternalName Service:
apiVersion: v1
kind: Service
metadata:
name: partner-api
namespace: app
spec:
type: ExternalName
externalName: api.example-partner.com.
Code calls partner-api.app.svc.cluster.local, which resolves to a CNAME to api.example-partner.com, which resolves to an IP, all cached locally. We also got the side benefit of centralizing the partner name; if the partner ever renames their API we change one yaml.
Measuring the improvement
After we rolled out ndots:2 to the worst offenders, the CoreDNS NXDOMAIN rate dropped by 80% cluster-wide. P99 latency on the affected service went from 680 ms to 420 ms. Simple win.
kubectl logs -n kube-system coredns-... | grep -c NXDOMAIN
# 2841721 -> 540311 over equivalent window
Reflection
The default ndots:5 exists because Kubernetes wants kubectl exec -- curl http://my-service to work, where my-service has zero dots and the search path is the whole story. If your workloads talk to the outside world a lot, you are paying for a default that was not designed for you.
The simplest first-pass fix is dnsConfig.options.ndots: 2 on the Deployments that do external calls. If you haven’t deployed NodeLocal DNSCache, do that next. If you’re still feeling pain, ExternalName Services or caching in the application are options. Related: see my post on Debugging DNS in a kind cluster for the local-cluster version of this pain.