handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

purna pradeep

Im trying to run spark-submit to kubernetes cluster with spark 2.3 docker container image

The challenge im facing is application have a mainapplication.jar and other dependency files & jars which are located in Remote location like AWS s3 ,but as per spark 2.3 documentation there is something called kubernetes init-container to download remote dependencies but in this case im not creating any Podspec to include init-containers in kubernetes, as per documentation Spark 2.3 spark/kubernetes internally creates Pods (driver,executor) So not sure how can i use init-container for spark-submit when there are remote dependencies.

https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-remote-dependencies

Please suggest

Reply | Threaded
Open this post in threaded view
|

Re: handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

Anirudh Ramanathan
You don't need to create the init-container. It's an implementation detail.
If you provide a remote uri, and specify spark.kubernetes.container.image=<spark-image>, Spark internally will add the init container to the pod spec for you.
If for some reason, you want to customize the init container image, you can choose to do that using the specific options, but I don't think this is necessary in most scenarios. The init container image, driver and executor images can be identical by default.


On Thu, Mar 8, 2018 at 6:52 AM purna pradeep <[hidden email]> wrote:

Im trying to run spark-submit to kubernetes cluster with spark 2.3 docker container image

The challenge im facing is application have a mainapplication.jar and other dependency files & jars which are located in Remote location like AWS s3 ,but as per spark 2.3 documentation there is something called kubernetes init-container to download remote dependencies but in this case im not creating any Podspec to include init-containers in kubernetes, as per documentation Spark 2.3 spark/kubernetes internally creates Pods (driver,executor) So not sure how can i use init-container for spark-submit when there are remote dependencies.

https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-remote-dependencies

Please suggest



--
Anirudh Ramanathan
Reply | Threaded
Open this post in threaded view
|

Re: handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

Yinan Li
One thing to note is you may need to have the S3 credentials in the init-container unless you use a publicly accessible URL. If this is the case, you can either create a Kubernetes secret and use the Spark config option for mounting secrets (secrets will be mounted into the init-container as well as into the main container), or you create a custom init-container with the credentials baked in.

Yinan

On Thu, Mar 8, 2018 at 12:05 PM, Anirudh Ramanathan <[hidden email]> wrote:
You don't need to create the init-container. It's an implementation detail.
If you provide a remote uri, and specify spark.kubernetes.container.image=<spark-image>, Spark internally will add the init container to the pod spec for you.
If for some reason, you want to customize the init container image, you can choose to do that using the specific options, but I don't think this is necessary in most scenarios. The init container image, driver and executor images can be identical by default.


On Thu, Mar 8, 2018 at 6:52 AM purna pradeep <[hidden email]> wrote:

Im trying to run spark-submit to kubernetes cluster with spark 2.3 docker container image

The challenge im facing is application have a mainapplication.jar and other dependency files & jars which are located in Remote location like AWS s3 ,but as per spark 2.3 documentation there is something called kubernetes init-container to download remote dependencies but in this case im not creating any Podspec to include init-containers in kubernetes, as per documentation Spark 2.3 spark/kubernetes internally creates Pods (driver,executor) So not sure how can i use init-container for spark-submit when there are remote dependencies.

https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-remote-dependencies

Please suggest



--
Anirudh Ramanathan