Submitting Spark Job thru REST API?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Submitting Spark Job thru REST API?

Eric Beabes
Under Spark 2.4 is it possible to submit a Spark job thru REST API - just like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but our security team is not allowing us to submit a job from the Master node or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to install Spark/Hadoop JARs on the Container. If it's not possible to use REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
--class myclass --master "yarn url" --deploy-mode cluster \
In other words, instead of --master yarn, specify a URL. Would this still work the same way?
Reply | Threaded
Open this post in threaded view
|

Re: Submitting Spark Job thru REST API?

Amit Joshi
Hi,
There is other option like apache Livy which lets you submit the job using Rest api.
Other option can be using AWS Datapipeline to configure your job as EMR activity.
To activate pipeline, you need console or a program.

Regards
Amit

On Thursday, September 3, 2020, Eric Beabes <[hidden email]> wrote:
Under Spark 2.4 is it possible to submit a Spark job thru REST API - just like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but our security team is not allowing us to submit a job from the Master node or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to install Spark/Hadoop JARs on the Container. If it's not possible to use REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
--class myclass --master "yarn url" --deploy-mode cluster \
In other words, instead of --master yarn, specify a URL. Would this still work the same way?
Reply | Threaded
Open this post in threaded view
|

回复:Submitting Spark Job thru REST API?

tianlangstudio
In reply to this post by Eric Beabes
Hello, Eric
Maybe you can use  Spark JobServer 0.10.0 https://github.com/spark-jobserver/spark-jobserver
We used this with Spark 1.6, and it is awesome. You know the project is still very active. So highly recommend it to you


------------------------------------------------------------------
发件人:Eric Beabes <[hidden email]>
发送时间:2020年9月3日(星期四) 04:58
收件人:spark-user <[hidden email]>
主 题:Submitting Spark Job thru REST API?

Under Spark 2.4 is it possible to submit a Spark job thru REST API - just like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but our security team is not allowing us to submit a job from the Master node or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to install Spark/Hadoop JARs on the Container. If it's not possible to use REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
--class myclass --master "yarn url" --deploy-mode cluster \
In other words, instead of --master yarn, specify a URL. Would this still work the same way?
Reply | Threaded
Open this post in threaded view
|

Re: Submitting Spark Job thru REST API?

Eric Beabes
Thank you all for your responses. Will try them out. 

On Thu, Sep 3, 2020 at 12:06 AM tianlangstudio <[hidden email]> wrote:
Hello, Eric
Maybe you can use  Spark JobServer 0.10.0 https://github.com/spark-jobserver/spark-jobserver
We used this with Spark 1.6, and it is awesome. You know the project is still very active. So highly recommend it to you


------------------------------------------------------------------
发件人:Eric Beabes <[hidden email]>
发送时间:2020年9月3日(星期四) 04:58
收件人:spark-user <[hidden email]>
主 题:Submitting Spark Job thru REST API?

Under Spark 2.4 is it possible to submit a Spark job thru REST API - just like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but our security team is not allowing us to submit a job from the Master node or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to install Spark/Hadoop JARs on the Container. If it's not possible to use REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
--class myclass --master "yarn url" --deploy-mode cluster \
In other words, instead of --master yarn, specify a URL. Would this still work the same way?
Reply | Threaded
Open this post in threaded view
|

Re: Submitting Spark Job thru REST API?

Eric Beabes
Livy is working fairly well for submitting a job. One Question... At present I am using it like this:

curl -H 'Content-Type: application/json' http://$LIVY_URL/batches -X POST -d"{
'name' : '$JOB_NAME',
'className' : '$CLASS_NAME',
'conf': {'spark.yarn.app.container.log.dir': '$LOG_DIR'},
'conf': {'spark.executor.heartbeatInterval': $HEART_BEAT_INTERVAL},
'conf': {'spark.driver.memoryOverhead': $MEMORY_OVERHEAD},
'file' : '$FILE_PATH',
'proxyUser' : 'livy',
'driverMemory' : '$DRIVER_MEMORY',
'driverCores' : $DRIVER_CORES,
'args' : $ARGS
}"

This is working well except the file has to be uploaded to S3 or HDFS prior to running this command.
Is there a way to upload the JAR file prior to running this? Get the Id of this file & then submit the Spark job. Kinda like how Flink does it.
I realize this is an Apache Livy question so I will also ask on their mailing list. Thanks.


On Thu, Sep 3, 2020 at 11:47 AM Eric Beabes <[hidden email]> wrote:
Thank you all for your responses. Will try them out. 

On Thu, Sep 3, 2020 at 12:06 AM tianlangstudio <[hidden email]> wrote:
Hello, Eric
Maybe you can use  Spark JobServer 0.10.0 https://github.com/spark-jobserver/spark-jobserver
We used this with Spark 1.6, and it is awesome. You know the project is still very active. So highly recommend it to you


------------------------------------------------------------------
发件人:Eric Beabes <[hidden email]>
发送时间:2020年9月3日(星期四) 04:58
收件人:spark-user <[hidden email]>
主 题:Submitting Spark Job thru REST API?

Under Spark 2.4 is it possible to submit a Spark job thru REST API - just like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but our security team is not allowing us to submit a job from the Master node or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to install Spark/Hadoop JARs on the Container. If it's not possible to use REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
--class myclass --master "yarn url" --deploy-mode cluster \
In other words, instead of --master yarn, specify a URL. Would this still work the same way?