A single build.sbt file to start Spark REPL?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A single build.sbt file to start Spark REPL?

Alexy Khrabrov
The usual way to use Spark with SBT is to package a Spark project using sbt package (e.g. per Quick Start) and submit it to Spark using the bin/ scripts from Sark distribution.  For plain Scala project, you don’t need to download anything, you can just get a build.sbt file with dependencies and e.g. say “console” which will start a Scala REPL with the dependencies on the class path.  Is there a way to avoid downloading Spark tarball completely, by defining the spark-core dependency in build.sbt, and using `run` or `console` to invoke Spark REPL from sbt?  I.e. the goal is: create a single build.sbt file, such that if you run sbt in its directory, and then say run/console (with optional parameters), it will download all Spark dependencies and start the REPL.  Should work on a fresh machine where Spark tarball had never been untarred.

A+
Reply | Threaded
Open this post in threaded view
|

Re: A single build.sbt file to start Spark REPL?

Tobias Pfeiffer
Hi,

I guess it should be possible to dig through the scripts
bin/spark-shell, bin/spark-submit etc. and convert them to a long sbt
command that you can run. I just tried

  sbt "run-main org.apache.spark.deploy.SparkSubmit spark-shell
--class org.apache.spark.repl.Main"

but that fails with

Failed to initialize compiler: object scala.runtime in compiler mirror
not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.

Would be happy to learn about a way to do that, too.

Tobias

On Tue, Jun 3, 2014 at 11:56 AM, Alexy Khrabrov <[hidden email]> wrote:
> The usual way to use Spark with SBT is to package a Spark project using sbt package (e.g. per Quick Start) and submit it to Spark using the bin/ scripts from Sark distribution.  For plain Scala project, you don’t need to download anything, you can just get a build.sbt file with dependencies and e.g. say “console” which will start a Scala REPL with the dependencies on the class path.  Is there a way to avoid downloading Spark tarball completely, by defining the spark-core dependency in build.sbt, and using `run` or `console` to invoke Spark REPL from sbt?  I.e. the goal is: create a single build.sbt file, such that if you run sbt in its directory, and then say run/console (with optional parameters), it will download all Spark dependencies and start the REPL.  Should work on a fresh machine where Spark tarball had never been untarred.
>
> A+