Spark-AMI version compatibility table

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark-AMI version compatibility table

Nick Chammas
Howdy folks,

I'm working through the Spark on EMR tutorial here. The attraction of running Spark on EMR is that it is probably the fastest and easiest way to get Spark running and doing something useful.

I had a lot of trouble finding the right combination of Spark install script and EMR AMI that would give me a working cluster and Spark shell.

The tutorial points to a 0.8.1 version of the bootstrap script and doesn't specify an AMI version. If you use Python/boto to complete the tutorial, this means EMR will default to a 1.0 AMI. This doesn't work and leads to errors about a missing core-site.xml file, among other things.

Here are some other combinations I tried (up to the point of seeing if the Spark shell starts up successfully):

Bootstrap script AMI versionResult
Spark shell doesn't start
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 1.0bootstrap times out; missing core-site.xml
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh2.0 bootstrap times out; missing EmrMetrics*.jar
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh2.1 bootstrap times out
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 2.2 bootstrap times out
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 2.3 bootstrap times out
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 3.0 bootstrap times out
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 1.0 Spark shell fails to initialize; failure to "load native Mesos library"
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.4Spark shell hangs on initialization
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 3.0bootstrap fails; missing dpkg
Spark shell starts
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 2.4 success; log4j warnings
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.0 success
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.1success
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.2success
s3://elasticmapreduce/samples/spark/install-spark-shark.sh2.3success

Do y'all "own" these EMR bootstrap scripts, or are they provided by Amazon? It would be helpful if

  1. the install script explicitly checked for a compatible AMI version, and/or
  2. there was an official compatibility table up somewhere, preferably linked to from that EMR tutorial (which has high visibility on Google)
I'm new both to Spark and to AWS in general. Forgive me if I'm barking up the wrong tree here.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: Spark-AMI version compatibility table

Nick Chammas
The table formatting shows up weird on the user list web page. You can see that same Spark-AMI compatibility table here on Google Docs.


On Fri, Feb 21, 2014 at 11:43 PM, nicholas.chammas <[hidden email]> wrote:
Howdy folks,

I'm working through the Spark on EMR tutorial here. The attraction of running Spark on EMR is that it is probably the fastest and easiest way to get Spark running and doing something useful.

I had a lot of trouble finding the right combination of Spark install script and EMR AMI that would give me a working cluster and Spark shell.

The tutorial points to a 0.8.1 version of the bootstrap script and doesn't specify an AMI version. If you use Python/boto to complete the tutorial, this means EMR will default to a 1.0 AMI. This doesn't work and leads to errors about a missing core-site.xml file, among other things.

Here are some other combinations I tried (up to the point of seeing if the Spark shell starts up successfully):

Bootstrap script AMI versionResult
Spark shell doesn't start
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 1.0bootstrap times out; missing core-site.xml
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh2.0 bootstrap times out; missing EmrMetrics*.jar
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh2.1 bootstrap times out
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 2.2 bootstrap times out
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 2.3 bootstrap times out
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 3.0 bootstrap times out
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 1.0 Spark shell fails to initialize; failure to "load native Mesos library"
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.4Spark shell hangs on initialization
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 3.0bootstrap fails; missing dpkg
Spark shell starts
s3://elasticmapreduce/samples/spark/0.8.1/install-spark-shark.sh 2.4 success; log4j warnings
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.0 success
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.1success
s3://elasticmapreduce/samples/spark/install-spark-shark.sh 2.2success
s3://elasticmapreduce/samples/spark/install-spark-shark.sh2.3success

Do y'all "own" these EMR bootstrap scripts, or are they provided by Amazon? It would be helpful if

  1. the install script explicitly checked for a compatible AMI version, and/or
  2. there was an official compatibility table up somewhere, preferably linked to from that EMR tutorial (which has high visibility on Google)
I'm new both to Spark and to AWS in general. Forgive me if I'm barking up the wrong tree here.

Nick



View this message in context: Spark-AMI version compatibility table
Sent from the Apache Spark User List mailing list archive at Nabble.com.