How to monitor the throughput and latency of the combineByKey transformation in Spark 3?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How to monitor the throughput and latency of the combineByKey transformation in Spark 3?

felipe.o.gutierrez
Hi community,

I built a simple count and sum spark application which uses the
combineByKey transformation [1] and I would like to monitor the
throughput in/out of this transformation and the latency that the
combineByKey spends to pre-aggregate tuples. Ideally, the latency I
would like to take the average of the last 30 seconds using a
histogram and the 99th percentile.

I was imagining to add a dropwizard metrics [2] on the combiner
function that I pass to the combineByKey. But It is confused because
there are 2 more functions that I must pass to the combineByKey.

How would you suggest me to implement this monitoring strategy?

Thanks,
Felipe
[1] https://github.com/felipegutierrez/explore-spark/blob/master/src/main/scala/org/sense/spark/app/combiners/TaxiRideCountCombineByKey.scala#L40
[2] https://metrics.dropwizard.io/4.1.2/getting-started.html

--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez
-- https://felipeogutierrez.blogspot.com

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]