Is there a way to see 'Application Detail UI' page (at master:4040) for completed applications? Currently, I can see that page only for running applications, I would like to see various numbers for the application after it has completed.
This will be a feature in Spark 1.0 but is not yet released. In 1.0 Spark applications can persist their state so that the UI can be reloaded after they have completed.
On Sun, Mar 30, 2014 at 10:30 AM, David Thomas <[hidden email]> wrote:
I am using Spark 1.0.1. But I am still not able to see the stats for completed apps on port 4040 - only for running apps. Is this feature supported or is there a way to log this info to some file? I am interested in stats about the total # of executors, total runtime, and total memory used by my Spark program.
This post was updated on .
If I don't understand you wrong, setting event logging in the $SPARK_JAVA_OPTS in conf/spark-env.sh should achieve what you want. I'm logging to the HDFS, but according to the config page a normal folder on the OS FS should be possible as well.
Example with all other settings removed:
This works with the Spark shell, I haven't tested other applications though.
Note that the completed applications will disappear from the list if you restart Spark completely, even though they'll still be stored in the log folder.
As Simon explained, you need to set "spark.eventLog.enabled" to true.
I'd like to add that the usage of SPARK_JAVA_OPTS to set spark configurations is deprecated. I'm sure many of you have noticed this from the scary warning message we print out. :) The recommended and supported way of setting this is by adding the line "spark.eventLog.enabled true" to $SPARK_HOME/conf/spark-defaults.conf. This will be picked up by Spark submit and passed to your application.
2014-08-14 15:45 GMT-07:00 durin <[hidden email]>:
If I don't understand you wrong, setting event logging in the SPARK_JAVA_OPTS
I set "spark.eventLog.enabled" to true in $SPARK_HOME/conf/spark-defaults.conf and also configured the logging to a file as well as console in log4j.properties. But I am not able to get the log of the statistics in a file. On the console there is a lot of log messages along with the stats - so hard to separate the stats. I prefer the online format that appears on localhost:4040 - it is more clear. I am running the job in standalone mode on my local machine. is there some way to recreate the stats online after the job has completed?
More specifically, as indicated by Patrick above, in 1.0+, apps will have persistent state so that the UI can be reloaded. Is there a way to enable this feature in 1.0.1?
Not sure if I understand you correctly, but here is how the user normally uses the event logging functionality:
After setting "spark.eventLog.enabled" and optionally "spark.eventLog.dir", the user runs his/her Spark application and calls sc.stop() at the end of it. Then he/she goes to the standalone Master UI (under http://<master-url>:8080 by default) and click on the application under the Completed Applications table. This will link to the Spark UI of the finished application in its completed state, under a path that looks like "http://<master-url>:8080/history/<app-Id>". It won't be on "http://localhost:4040" anymore because the port is now freed for new applications to bind their SparkUIs to. To access the file that stores the raw statistics, go to the file specified in "spark.eventLog.dir". This is by default "/tmp/spark-events", though in Spark 1.0.1 it may be in HDFS under the same path.
I could be misunderstanding what you mean by the stats being buried in the console output, because the events are not logged to the console but to a file in "spark.eventLog.dir". For all of this to work, of course, you have to run Spark in standalone mode (i.e. with master set to spark://<master-url>:7077). In other modes, you will need to use the history server instead.
Does this make sense?
2014-08-14 18:08 GMT-07:00 SK <[hidden email]>:
More specifically, as indicated by Patrick above, in 1.0+, apps will have
I'm running something close to the present master (I compiled several days ago) but am having some trouble viewing history.
I set "spark.eventLog.dir" to true, but continually receive the error message (via the web UI) "Application history not found...No event logs found for application ml-pipeline in file:/tmp/spark-events/ml-pipeline-1408117588599". I tried 2 fixes:
-I manually set "spark.eventLog.dir" to a path beginning with "file:///", believe that perhaps the problem was an invalid protocol specification.
-I inspected /tmp/spark-events manually and noticed that each job directory (and the files there-in) were owned by the user who launched the job and were not world readable. Since I run Spark from a dedicated Spark user, I set the files world readable but I still receive the same "Application history not found" error.
Is there a configuration step I may be missing?
On Thu, Aug 14, 2014 at 7:33 PM, Andrew Or <[hidden email]> wrote:
In reply to this post by Andrew Or-2
Ok, I was specifying --master local. I changed that to --master spark://<localhostname>:7077 and am now able to see the completed applications. It provides summary stats about runtime and memory usage, which is sufficient for me at this time.
However it doesn't seem to archive the info in the "application detail UI" that lists detailed stats about the completed stages of the application - which would be useful for identifying bottleneck steps in a large application. I guess we need to capture the "application detail UI" screen before the app run completes or find a way to extract this info by parsing the Json log file in /tmp/spark-events.
Your configuration looks alright to me. We parse both "file:/" and "file:///" the same way so that shouldn't matter. I just tried this on the latest master and verified that it works for me. Can you dig into the directory "/tmp/spark-events/ml-pipeline-1408117588599" to make sure that it's not empty? In particular, look for a file that looks like "EVENT_LOG_0", then check the content of that file. The last event (on the last line) of the file should be an "Application Complete" event. If this is not true, it's likely that your application did not call "sc.stop()", though the logs should still show up in spite of that. If all of that fails, try logging it in a more accessible place through setting "spark.eventLog.dir". Let me know if that helps.
You shouldn't need to capture the screen before it finishes; the whole point of the event logging functionality is that the user doesn't have to do that themselves. What happens if you click into the "application detail UI"? In Spark 1.0.1, if it can't find the logs it may just refresh instead of printing a more explicit message. However, from your configuration you should be able to see the detailed stage information in the UI in addition to just the summary statistics under "Completed Applications". I have listed a few debugging steps in the paragraph above, so maybe they're also applicable to you.
Let me know if that works,
2014-08-15 11:07 GMT-07:00 SK <[hidden email]>:
I am able to access the Application details web page from the master UI page when I run Spark in standalone mode on my local machine. However, I am not able to access it when I run Spark on our private cluster. The Spark master runs on one of the nodes in the cluster. I am able to access the spark master UI at spark://<master-url>:8080. It shows the listing of all the running and completed apps. When I click on the completed app, and access the Application details link, the link points to:
When I view the page source to view the html source, the href portion is blank ("").
However, on my local machine, when I click the Application detail link for a completed app, it correctly points to
and when I view the page's html source, the href portion points to "/history/<app-id>"
On the cluster, I have set spark.eventLog.enabled to true in $SPARK_HOME/conf/spark-defaults.conf on the master node as well as all the slave nodes. I am using spark 1.0.1 on the cluster.
I am not sure why I am able to access the application details for completed apps when the app runs on my local machine but not for the apps that run on our cluster, although in both cases I am using spark 1.0.1 in standalone mode. Do I need to do any additional configuration to enable this history on the cluster?
Have a look at the history server, looks like you have enabled history server on your local and not on the remote server.
On Tue, Aug 26, 2014 at 7:01 AM, SK <[hidden email]> wrote:
I have already tried setting the history server and accessing it on <master-url>:18080 as per the link. But the page does not list any completed applications. As I mentioned in my previous mail, I am running Spark in standalone mode on the cluster (as well as on my local machine). According to the link, it appears that the history server is required only in mesos or yarn mode, not in standalone mode.
Thanks for the tips. I just built the master branch of Spark last
night, but am still having problems viewing history through the
standalone UI. I dug into the Spark job events directories as you
suggested, and I see at a minimum 'SPARK_VERSION_1.0.0' and
'EVENT_LOG_1'; for applications that call 'sc.stop()' I also see
'APPLICATION_COMPLETE'. The version and application complete files
are empty; the event log file contains the information one would need
to repopulate the web UI.
The follow may be helpful in debugging this:
-Each job directory (e.g.
'/tmp/spark-events/testhistoryjob-1409246088110') and the files within
are owned by the user who ran the job with permissions 770. This
prevents the 'spark' user from accessing the contents.
-When I make a directory and contents accessible to the spark user,
the history server (invoked as 'sbin/start-history-server.sh
/tmp/spark-events') is able to display the history, but the standalone
web UI still produces the following error: 'No event logs found for
application HappyFunTimes in
file:///tmp/spark-events/testhistoryjob-1409246088110. Did you specify
the correct logging directory?'
-Incase it matters, I'm running pyspark.
Do you know what may be causing this? When you attempt to reproduce
locally, who do you observe owns the files in /tmp/spark-events?
On Tue, Aug 26, 2014 at 8:51 AM, SK <[hidden email]> wrote:
> I have already tried setting the history server and accessing it on
> <master-url>:18080 as per the link. But the page does not list any completed
> applications. As I mentioned in my previous mail, I am running Spark in
> standalone mode on the cluster (as well as on my local machine). According
> to the link, it appears that the history server is required only in mesos or
> yarn mode, not in standalone mode.
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-webUI-application-details-page-tp3490p12834.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
I was able to recently solve this problem for standalone mode. For this mode, I did not use a history server. Instead, I set spark.eventLog.dir (in conf/spark-defaults.conf) to a directory in hdfs (basically this directory should be in a place that is writable by the master and accessible globally to all the nodes).
How did you specify the HDFS path? When i put
in my spark-defaults.conf file, I receive the following error:
An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.IOException: Call to crosby.research.intel-research.net/10.212.84.53:54310 failed on local exception: java.io.EOFException
On Thu, Aug 28, 2014 at 12:26 PM, SK <[hidden email]> wrote:
I was able to recently solve this problem for standalone mode. For this mode,
I specified as follows:
We use mapr fs for sharing files. I did not provide an ip address or port number - just the directory name on the shared filesystem.
On Aug 29, 2014 8:28 AM, "Brad Miller" <[hidden email]> wrote:
This post has NOT been accepted by the mailing list yet.
In reply to this post by David Thomas
I've a similar problem. I want to see the detailed logs of Completed Applications so I've set in my program :
but when I click on the application in the webui, I got a page with the message :
Application history not found (app-20150126000651-0331)
No event logs found for application xxx$ in file:/tmp/spark-events/xxx-1422227211500. Did you specify the correct logging directory?
despite the fact that the directory exist and contains 3 files :
I use spark 1.1.0 on a standalone cluster with 3 nodes.
Any suggestion to solve the problem ?
This post has NOT been accepted by the mailing list yet.
Where is the history server running? Is it running on the same node as the logs directory.
|Free forum by Nabble||Edit this page|