This post has NOT been accepted by the mailing list yet.
I am trying to use Spark for my own applications, and I am currently profiling the performance with local mode, and I have a couple of questions:
1. When I set spark.master local[N], it means the will use up to N worker *threads* on the single machine. Is this equivalent to say there are N worker *nodes* as described in http://spark.apache.org/docs/latest/cluster-overview.html (So each worker node/thread are viewed separately and can have its own executor for each application)
2. Is there anyway to set up the max memory used by each worker thread/node? I only find we can set the memory for each executor? (spark.executor.mem)
In spark all the sample programs uses local[N] it means N no of threads running to complete the job(jobs divided into smaller task that run on executors). Worker nodes that can run application code in the cluster.Threads run on worker nodes.
(N no of threads !=N no of worker nodes)
sencond question not clear but in the standalone mode spark.executor.memory won't have any effect because worker lives within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory to something higher, for example 2g