Exporting all Executor Metrics in Prometheus format in K8s cluster

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Exporting all Executor Metrics in Prometheus format in K8s cluster

Dávid Szakállas
I’ve been trying to set up monitoring for our Spark 3.0.1 cluster running in K8s. We are using Prometheus as our monitoring system. We require both executor and driver metrics. My initial approach was to use the following configuration, to expose both  metrics on the Spark UI:

{
    'spark.ui.prometheus.enabled': ‘true’
}

I was able to scrape http://<driver_hostname>:4040/metrics/prometheus/ for driver and http://<driver_hostname>:4040/metrics/executors/prometheus/ for executor metrics. However, the executor metrics only contain those defined here: https://spark.apache.org/docs/latest/monitoring.html#executor-metrics, which is referred to as ExecutorSummary. However, I would like to get all metrics from the Executor instance metric system: https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor.

I am not sure if these are available on the driver at all, so I’ve been thinking of directly scraping the executors instead. It seems PrometheusServlet is meant for this purpose, however the executors aren't running web servers. I also don’t seem to find a configuration setting to open up a port on the executor container, so that it can be scraped. So the thing I have in my mind right now is writing a custom sink that exports the metrics in the Prometheus format to a local file, and running a sidecar container with a nginx that serves that static file. In turn the nginx endpoint can be scraped by Prometheus. Am I overcomplicating this? Is there a simpler approach?

Thanks,
David Szakallas

signature.asc (849 bytes) Download Attachment