I'm a bit confused about web UI access of a stand alone spark app.
- When running a spark app, a web server is launched at localhost:4040. When the standalone app execution is finished, the web server is shut down. What's the use of this web server? There is no way of reviewing the data when the standalone app exists.
- Creating SparkContext at spark://localhost:7077 creates another web UI. Is this web UI supposed to be used with localhost:4040, or is it a replacement?
- Creating a context with spark://localhost:7077, and after running ./bin/start-all.sh, I get this warning:
WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
The Spark Application is defined by all things executed within a given Spark Context. This application's web server runs on port 4040 of the machine where the driver of the application is being executed. An example driver of a Spark Application is a single instance of the Spark Shell. This web ui, on port 4040, displays statistics about the Application such as the stages being executed, the number of tasks per stage and the progress of the tasks within a stage. Other Application statistics include the caching locations and percentages of RDDs being used within an Application (and across the stages of that Application) and the garbage collection times of the tasks that have been completed.
The Spark Cluster is defined by all Applications executing on top of the resources provisioned to your particular deployment of Spark. These resources are managed by a Spark Master which contains the task scheduler and the cluster manager (unless you're using YARN or Mesos in which case they will provide the cluster manager). The UI on port 8080 is the UI of the Spark Master, and it is accessible on whichever node is currently executing the Spark Master. This UI displays cluster statistics such as the number of available worker nodes, the number of JVM executor processes per worker node, the number of running Applications utilizing this Cluster, et cetera.
In short, shutting down a Spark Application will kill the UI on port 4040 because your application is terminated and therefore there are no running statistics to collect about that application. However, the UI on port 8080 continues to be up and report cluster-wide statistics until you kill the cluster by killing Spark Master.
Hope that long-winded explanation made sense!
On Fri, Dec 27, 2013 at 9:23 AM, Aureliano Buendia <[hidden email]> wrote:
Horia, thanks for the detailed explanation. The concept of development workflow in spark is still a blurry subject to me.
My spark application is a scala class with a main function, nothing special here: I edit the code in my IDE, compile the code, run it, then go back to my IDE making more changes, compile the code, run it... . The UI on 4040 has no chance of running all the time. The spark-shell application you mentioned is a special case of a long running application. How do you develop while keeping the UI on 4040 up all the time?
On Fri, Dec 27, 2013 at 7:50 PM, Horia <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|