Spark Core - Embed in other application

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Core - Embed in other application

sparkuser99
Hi,

I have a use case to process simple ETL like jobs. The data volume is very
less (less than few GB), and can fit easily on my running java application's
memory. I would like to take advantage of Spark dataset api, but don't need
any spark setup (Standalone / Cluster ). Can I embed spark in existing Java
application and still use ?

I heard local spark mode is only for testing. For small data sets like, can
this still be used in production? Please advice if any disadvantages.

Regards
Reddy



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Core - Embed in other application

Andres Ivaldi
Hi, yes you can, also I've developed an engine to perform ETL.

I've build a Rest service with Akka, with a method called "execute" that recibe a JSON structure representing the ETL. 
You just need to configure your embedded standalone Spark, I did something like this, this is in scala:

val spark = SparkSession
      .builder().master("local[8]")
      .appName("xxx")
      .config("spark.sql.warehouse.dir", config.getString("cache.directory") )
      .config("spark.driver.memory", "11g")
      .config("spark.executor.cores","8")
      .config("spark.shuffle.compress",false)
      .config("spark.executor.memory", "11g")
      .config("spark.cores.max", "12")
      .config("spark.deploy.defaultCores", "3")
      .config("spark.driver.maxResultSize","0")
      .config("spark.default.parallelism","9")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .config("spark.kryoserializer.buffer","1024k") 
      .config("spark.sql.shuffle.partitions","1")
      .config("spark.kryo.unsafe", true)
      .config("spark.ui.port","4041")
      .enableHiveSupport()
      .getOrCreate()

(This will crash if you have already a Spark running ...)

And use the spark variable as you wish.




On Thu, Dec 6, 2018 at 9:23 PM sparkuser99 <[hidden email]> wrote:
Hi,

I have a use case to process simple ETL like jobs. The data volume is very
less (less than few GB), and can fit easily on my running java application's
memory. I would like to take advantage of Spark dataset api, but don't need
any spark setup (Standalone / Cluster ). Can I embed spark in existing Java
application and still use ?

I heard local spark mode is only for testing. For small data sets like, can
this still be used in production? Please advice if any disadvantages.

Regards
Reddy



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Ing. Ivaldi Andres