Help on Spark/Shark in AWS EMR

classic Classic list List threaded Threaded
1 message Options
li
Reply | Threaded
Open this post in threaded view
|

Help on Spark/Shark in AWS EMR

li
This post has NOT been accepted by the mailing list yet.
Hi, I am following the following instruction to set up EMR Spark/Shark cluster:
http://aws.amazon.com/articles/4926593393724923

In the instruction, it says to run the following command to run shark
SPARK_MEM="2g" /home/hadoop/shark/bin/shark

I want to know what this "SPARK_MEM = '2g'" is about?

Does that mean we only allocate 2G per node for Spark?  How does it have anything to do with Shark?

I know Shark is running on Spark, does that mean if I want to run very big data set, for example, 30 G and
if I want to run all of the data in memory, does that mean I need to set to at least 15 G? (Assume I have 3 nodes: 1 master, 2 slaves)

Thanks!