Is there a way to get the final web URL from an active Spark context

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a way to get the final web URL from an active Spark context

Jeff Evans
Given a session/context, we can get the UI web URL like this:

sparkSession.sparkContext.uiWebUrl

This gives me something like http://node-name.cluster-name:4040.  If
opening this from outside the cluster (ex: my laptop), this redirects
via HTTP 302 to something like
http://node-name.cluster-name:8088/proxy/redirect/application_1579210019853_0023/.
For discussion purposes, call the latter one the "final web URL".
Critically, this final URL is active even after the application
terminates.  The original uiWebUrl
(http://node-name.cluster-name:4040) is not available after the
application terminates, so one has to have captured the redirect in
time, if they want to provide a persistent link to that history server
UI entry (ex: for debugging purposes).

Is there a way, other than using some HTTP client, to detect what this
final URL will be directly from the SparkContext?

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get the final web URL from an active Spark context

Jeff Evans
To answer my own question, it turns out what I was after is the YARN ResourceManager URL for the Spark application.  As alluded to in SPARK-20458, it's possible to use the YARN API client to get this value.  Here is a gist that shows how it can be done (given an instance of the Hadoop Configuration object): https://gist.github.com/jeff303/8dab0e52dc227741b6605f576a317798


On Fri, Jan 17, 2020 at 4:09 PM Jeff Evans <[hidden email]> wrote:
Given a session/context, we can get the UI web URL like this:

sparkSession.sparkContext.uiWebUrl

This gives me something like http://node-name.cluster-name:4040.  If
opening this from outside the cluster (ex: my laptop), this redirects
via HTTP 302 to something like
http://node-name.cluster-name:8088/proxy/redirect/application_1579210019853_0023/.
For discussion purposes, call the latter one the "final web URL".
Critically, this final URL is active even after the application
terminates.  The original uiWebUrl
(http://node-name.cluster-name:4040) is not available after the
application terminates, so one has to have captured the redirect in
time, if they want to provide a persistent link to that history server
UI entry (ex: for debugging purposes).

Is there a way, other than using some HTTP client, to detect what this
final URL will be directly from the SparkContext?