Help on Spark/Shark in AWS EMR

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Help on Spark/Shark in AWS EMR

This post has NOT been accepted by the mailing list yet.
Hi, I am following the following instruction to set up EMR Spark/Shark cluster:

In the instruction, it says to run the following command to run shark
SPARK_MEM="2g" /home/hadoop/shark/bin/shark

I want to know what this "SPARK_MEM = '2g'" is about?

Does that mean we only allocate 2G per node for Spark?  How does it have anything to do with Shark?

I know Shark is running on Spark, does that mean if I want to run very big data set, for example, 30 G and
if I want to run all of the data in memory, does that mean I need to set to at least 15 G? (Assume I have 3 nodes: 1 master, 2 slaves)