In my previous post I wrote about how to load test GKE Workload Identity. In this post I’ll
describe how to get metrics from gke-metadata-server, the part of Workload Identity that runs on
your GKE clusters’ nodes. This solution is a temporary workaround until GKE provides a better way to
get metrics on gke-metadata-server.
Gke-metadata-server runs as a K8s DaemonSet. It exposes metrics about itself in Prometheus
text-based format. I want to have an external scraper make HTTP requests to periodically collect
these metrics. Unfortunately, the Prometheus HTTP server only listens on the Container’s localhost
interface. So how can we expose these metrics, i.e. make the HTTP endpoint available externally?
tl;dr lessons learned
socat is awesome.
If something you need is running on a computer you control, you can always find a way extract info
from it if you’re resourceful enough.
My specific GKE cluster configuration
GKE masters and nodes running version 1.15.9-gke.22
regional cluster in Google Cloud Platform (GCP) (not on-premise)
6 GKE nodes that are n1-standard-32 GCE instances in one node pool
each node is configured to have a maximum of 32 Pods
cluster and node pool have WI enabled
Notice the DaemonSet is configured with .spec.template.spec.hostNetwork: true below. This means
the HTTP server is also listening on the GKE node’s localhost interface.
We can run a separate workload on this cluster that uses socat to proxy HTTP requests to
gke-metadata-server. socat stands for socket cat and is a multipurpose relay. It’s netcat on
steroids and can relay any kind of packets not just TCP and UDP.
This proxy is deployed as a DaemonSet to make it easy to have a one-to-one
correspondence with each node-local gke-metadata-server Pod. The DaemonSet will also need to have
.spec.template.spec.hostNetwork: true so that it can share the same network namespace.
Here’s the proxy DaemonSet YAML. I use the Docker image alpine/socat:1.7.3.4-r0 which is a
tiny 3.61MB. The arguments ["TCP-LISTEN:54899,reuseaddr,fork", "TCP:127.0.0.1:54898"] tell socat
to forward traffic from 0.0.0.0:54899 to 127.0.0.1:54898 which is where the Prometheus metrics
are. fork tells socat to
After establishing a connection, handles its channel in a child process and keeps the parent
process attempting to produce more connections, either by listening or by connecting in a loop
kubectl --context [CONTEXT] -n monitoring get pods --selector app=gke-metadata-server-metrics-proxy -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESgke-metadata-server-metrics-proxy-dvlpg 1/1 Running 0 4d19h 10.200.208.6 my-cluster-n1-s-32-dfabe6b6-38px <none> <none>gke-metadata-server-metrics-proxy-dx4lq 1/1 Running 0 4d19h 10.200.208.8 my-cluster-n1-s-32-dfabe6b6-mnlg <none> <none>gke-metadata-server-metrics-proxy-j9p49 1/1 Running 0 4d19h 10.200.208.7 my-cluster-n1-s-32-dfabe6b6-vv9s <none> <none>gke-metadata-server-metrics-proxy-jvvjw 1/1 Running 0 4d19h 10.200.208.12 my-cluster-n1-s-32-192fa3d9-wb2c <none> <none>gke-metadata-server-metrics-proxy-k5sqd 1/1 Running 0 4d19h 10.200.208.10 my-cluster-n1-s-32-55dd75ff-6l40 <none> <none>gke-metadata-server-metrics-proxy-tdhkn 1/1 Running 0 4d19h 10.200.208.9 my-cluster-n1-s-32-55dd75ff-jqgk <none> <none>http GET '10.200.208.6:54899/metricz' | head -n 20# HELP go_gc_duration_seconds A summary of the GC invocation durations.# TYPE go_gc_duration_seconds summarygo_gc_duration_seconds{quantile="0"} 2.8295e-05go_gc_duration_seconds{quantile="0.25"} 3.6269e-05go_gc_duration_seconds{quantile="0.5"} 5.2122e-05go_gc_duration_seconds{quantile="0.75"} 7.585e-05go_gc_duration_seconds{quantile="1"} 0.099987877go_gc_duration_seconds_sum 7.738486774go_gc_duration_seconds_count 6809# HELP go_goroutines Number of goroutines that currently exist.# TYPE go_goroutines gaugego_goroutines 47# HELP go_info Information about the Go environment.# TYPE go_info gaugego_info{version="go1.14rc1"} 1# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.# TYPE go_memstats_alloc_bytes gaugego_memstats_alloc_bytes 2.4743056e+07# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.# TYPE go_memstats_alloc_bytes_total counter
groups:-name:gke-metadata-serverrules:# Compute a 5-minute rate for the counter `metadata_server_request_count`.-record:metadata_server_request_count:rate5mexpr:rate(metadata_server_request_count[5m])# Compute latency percentiles for the histogram metric# `metadata_server_request_durations_bucket` over 5-minute increments for each label# combination.-record:metadata_server_request_duration:p99expr:histogram_quantile(0.99, rate(metadata_server_request_durations_bucket[5m]))-record:metadata_server_request_duration:p95expr:histogram_quantile(0.95, rate(metadata_server_request_durations_bucket[5m]))-record:metadata_server_request_duration:p90expr:histogram_quantile(0.90, rate(metadata_server_request_durations_bucket[5m]))-record:metadata_server_request_duration:p50expr:histogram_quantile(0.50, rate(metadata_server_request_durations_bucket[5m]))-record:metadata_server_request_duration:meanexpr:rate(metadata_server_request_durations_sum[5m]) / rate(metadata_server_request_durations_count[5m])# Compute latency percentiles for the histogram metric# `metadata_server_request_durations_bucket` over 5-minute increments and aggregate all# labels. We must aggregate here instead of in Grafana because averaging percentiles doesn’t# work. To compute a percentile, you need the original population of events. The math is just# broken. An average of a percentile is meaningless.-record:metadata_server_all_request_duration:p99expr:histogram_quantile(0.99, sum(rate(metadata_server_request_durations_bucket[5m])) by (le))-record:metadata_server_all_request_duration:p95expr:histogram_quantile(0.95, sum(rate(metadata_server_request_durations_bucket[5m])) by (le))-record:metadata_server_all_request_duration:p90expr:histogram_quantile(0.90, sum(rate(metadata_server_request_durations_bucket[5m])) by (le))-record:metadata_server_all_request_duration:p50expr:histogram_quantile(0.50, sum(rate(metadata_server_request_durations_bucket[5m])) by (le))-record:metadata_server_all_request_duration:meanexpr:rate(metadata_server_request_durations_sum[5m]) / rate(metadata_server_request_durations_count[5m])# Compute latency percentiles for the histogram metric `outgoing_request_latency_bucket` over# 5-minute increments for each label combination.-record:outgoing_request_latency:p99expr:histogram_quantile(0.99, rate(outgoing_request_latency_bucket[5m]))-record:outgoing_request_latency:p95expr:histogram_quantile(0.95, rate(outgoing_request_latency_bucket[5m]))-record:outgoing_request_latency:p90expr:histogram_quantile(0.90, rate(outgoing_request_latency_bucket[5m]))-record:outgoing_request_latency:p50expr:histogram_quantile(0.50, rate(outgoing_request_latency_bucket[5m]))-record:outgoing_request_latency:meanexpr:rate(outgoing_request_latency_sum[5m]) / rate(outgoing_request_latency_count[5m])# Compute latency percentiles for the histogram metric `outgoing_request_latency_bucket` over# 5-minute increments and aggregate all labels. We must aggregate here instead of in Grafana# because averaging percentiles doesn’t work. To compute a percentile, you need the original# population of events. The math is just broken. An average of a percentile is meaningless.-record:outgoing_all_request_latency:p99expr:histogram_quantile(0.99, sum(rate(outgoing_request_latency_bucket[5m])) by (le))-record:outgoing_all_request_latency:p95expr:histogram_quantile(0.95, sum(rate(outgoing_request_latency_bucket[5m])) by (le))-record:outgoing_all_request_latency:p90expr:histogram_quantile(0.90, sum(rate(outgoing_request_latency_bucket[5m])) by (le))-record:outgoing_all_request_latency:p50expr:histogram_quantile(0.50, sum(rate(outgoing_request_latency_bucket[5m])) by (le))-record:outgoing_all_request_latency:meanexpr:rate(outgoing_request_latency_sum[5m]) / rate(outgoing_request_latency_count[5m])
Thanks to @mikedanese for the intial idea of using socat.