This post has NOT been accepted by the mailing list yet.
This post was updated on .
I'm setting up an Apache Spark cluster to perform realtime streaming computations and would like to monitor the performance of the deployment by tracking various metrics like sizes of batches, batch processing times, etc. From the Spark documentation, I saw that Spark supports various types of sinks.
1. Is it possible to configure Spark to send its metrics to an HTTP server(as sink)?
2. Is there a good tutorial/blog post that illustrates how the REST API of the spark components can be used to track such perf metrics?
3. The REST API description lists the various endpoints available. Is there a way to get a list of all the Spark batches that have been run for an application and other per-batch details (eg: number of events, processing time, etc.)