This is a post about Nginx’s DNS resolution behavior I didn’t know about but wish I did before I started using Kubernetes (K8s).
Nginx caches statically configured domains once
Symptoms
I moved a backend service foo
from running on a virtual machine to K8s. Foo’s clients include an
Nginx instance running outside K8s configured with this upstream
block.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
K8s Pods can be rescheduled anytime so their IPs aren’t stable. I’m supposed to use K8s Services
to avoid caching these ephemeral Pod IPs. But in my case because of interoperability reasons I was
registering Pod IPs directly as A records for foo.example.com.
. I started noticing that after my Pod
IPs changed either because of rescheduling or updating the Deployment, Nginx started throwing
502 Bad Gateway
errors.
Root Problem
Nginx resolves statically configured domain names only once at startup or configuration
reload time. So Nginx resolved foo.example.com.
once at startup to several Pod IPs and cached
them forever.
Solution
Using a variable for the domain name will make Nginx resolve and cache it using the TTL value of the
DNS response. So replace the upstream
block with a variable. I have no idea why it has to be a
variable to make Nginx resolve the domain periodically.
1
|
|
And replace the proxy_pass
line with
1 2 3 4 |
|
This behavior isn’t documented but has been observed empirically and discussed here, here,
and here. I also learned that this setup requires me to define a resolver
in the Nginx configs.
For some reason Nginx resolves statically configured domains by querying the nameserver specified in
/etc/resolv.conf
but periodically resolved domains require a completely different config
setting. I would love to know why.
The VM on which Nginx was running ran a Bind DNS server locally, so I set resolver 127.0.0.1
.
I triggered the code path that made Nginx send requests to foo and saw periodic DNS queries
occurring with sudo tcpdump -i lo -n dst port 53 | grep foo
.
What if that Nginx is also running on K8s?
Problem
I had another Nginx instance that also made requests to foo. This Nginx was running on K8s too. It was created with this Deployment YAML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
The nginx-config
ConfigMap was
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
I replaced upstream
with the same pattern above, but in this case when I needed to define
resolver
I couldn’t use 127.0.0.1
because there’s no Bind running locally. I can’t hardcode the
resolver because it might change.
Solution: run Nginx and foo on the same K8s cluster and use the cluster-local Service DNS record
If Nginx and foo run on the same K8s cluster, I can use the cluster-local DNS record created by a K8s Service matching the foo Pods. A Service like this
1 2 3 4 5 6 |
|
will create a DNS A record foo.bar.svc.cluster.local.
pointing to the K8s Service’s IP.
Since this Service’s IP is stable and it load balances requests to the underlying Pods, there’s no need for Nginx to
periodically lookup the Pod IPs. I can keep the upstream
block like so.
1 2 3 |
|
As its name implies, foo.bar.svc.cluster.local.
is only resolvable within the cluster. So
Nginx has to be running on the same cluster as foo.
Solution: dynamically set the Nginx resolver
equal to the system’s when the Pod starts
Disclaimer: This “solution” is more of an ugly, brittle hack that should only be used as a last resort.
What if Nginx is on another K8s cluster? Then I can set resolver
to the IP of one of the
nameservers in /etc/resolv.conf
. After a bunch of tinkering I came up with this way to dynamically
set the Nginx resolver
when the Pod starts. A placeholder for resolver
is set in the Nginx
ConfigMap, and a command at Pod startup copies over the templated config and replaces the
placeholder with a nameserver IP from /etc/resolv.conf
.
Change nginx-config
ConfigMap to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Deployment YAML then becomes (note the added command
, args
, and new volume
and volumeMount
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
A volume
of type emptyDir
is needed because recent versions of K8s made configMap volumes
read-only. EmptyDir types are writable.
Hopefully this helps some people out there who don’t want to spend as much time as I did Googling obscure Nginx behavior.