[Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

christinegong
What should i do to expose my own custom prometheus metrics for cluster mode spark streaming job? 

I want to run a spark streaming job to read from kafka , do some calculations and write to localhost prometheus on port 9111. https://github.com/jaegertracing/jaeger-analytics-java/blob/master/spark/src/main/java/io/jaegertracing/analytics/spark/SparkRunner.java#L47 is it possible to have the prometheus available in executors? I tried both emr cluster as well as k8s, only local mode works (the metrics are available on driver's 9111 only)
Looks like the prometheus servlet sink is my best option? Any advice would be much appreciated!!

Thanks,
Christine
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

ArtemisDev
I am confused with your question.  Are you running a the Spark cluster
on AWS EMR and trying to output the result to a Prometheus instance
running on your localhost?   Isn't your localhost behind the firewall
and not accessible by AWS?  What does it mean "have prometheus available
in executors"?   Apparently you need to have a Prometheus instance
running on AWS so your EMR cluster can access easily.

Directing Spark output/sink to Prometheus would be difficult. The ideal
integration scenario would be to write a Spark customer connector that
uses the Prometheus client library to populate your Spark processing
result directly in Prometheus' database.  Hope this helps...

-- ND

On 9/28/20 3:21 AM, Christine Gong wrote:

> What should i do to expose my own custom prometheus metrics for
> cluster mode spark streaming job?
>
> I want to run a spark streaming job to read from kafka , do some
> calculations and write to localhost prometheus on port 9111.
> https://github.com/jaegertracing/jaeger-analytics-java/blob/master/spark/src/main/java/io/jaegertracing/analytics/spark/SparkRunner.java#L47 
> is it possible to have the prometheus available in executors? I tried
> both emr cluster as well as k8s, only local mode works (the metrics
> are available on driver's 9111 only)
> Looks like the prometheus servlet sink is my best option? Any advice
> would be much appreciated!!
>
> Thanks,
> Christine

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

christinegong
Hi,
In the spark job, it exports to prometheus localhost http server, to be
later scraped by prometheus service.
(https://github.com/prometheus/client_java#http) The problem here is when
ssh to the emr instances themselves, only can see the metrics on (e.g. curl
localhost:9111) driver in local mode. If i run the spark job in cluster
mode, the localhost:9111 is still available for curl but no data, executors
i can not curl at all. Same scenario when running in kubernetes, i also
check if the metrics are there or not by execing into the containers.
Hope that makes the question clear here
Prometheus supports pushgateway but thats for short batch job, not sure if
thats applicable for my long running streaming job
Thanks!



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]