[Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Zhang, Yuqi

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com


This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Gourav Sengupta
Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

Regards,
Gourav Sengupta

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com


This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 


image001.png (5K) Download Attachment
image001.png (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Biplob Biswas
Hi Yuqi, 

Just curious can you share the spark-submit script and what are you passing as --master argument? 

Thanks & Regards
Biplob Biswas


On Wed, Oct 31, 2018 at 10:34 AM Gourav Sengupta <[hidden email]> wrote:
Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

Regards,
Gourav Sengupta

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com


This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Zhang, Yuqi

Hi Biplob,

 

Thank you very much for your reply.

 

To run spark-shell, I first tried the following command directly from my local:

 

bin/spark-submit \

--master k8s://https://api-xxx.ap-northeast-1.elb.amazonaws.com \

--deploy-mode client \

 

The master argument is the url of my k8s cluster, but failed. The error msg was

 

Could not load KUBERNETES classes. This copy of Spark may not have been compiled with KUBERNETES support.

 

After that, I checked the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md which says headless service is needed to run the spark driver on k8s pod, so I was trying to create a StatefulSet which contains spark driver. However there is always error showing as the picture I sent before.

 

Now I tried another way which is using the following command:

 

bin/spark-shell \

--master k8s://https://api-xxx.elb.amazonaws.com \

 

There are still some errors which is in the attachment picture local-spark-shell-error.png, but seems spark-shell could be started by this way, which is shown like attachment picture local-spark-shell.png. I guess by using this command the spark driver is on my local, but l don’t know how to create the driver pod on k8s cluster. So I would like to ask if there is any more detailed documentation regarding how to use spark client mode on k8s of spark 2.4.

 

Thank you in advance & Best Regards.

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Biplob Biswas <[hidden email]>
Date: Wednesday, October 31, 2018 19:10
To: "[hidden email]" <[hidden email]>
Cc: "Zhang, Yuqi" <[hidden email]>, user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Hi Yuqi, 

 

Just curious can you share the spark-submit script and what are you passing as --master argument? 


Thanks & Regards
Biplob Biswas

 

 

On Wed, Oct 31, 2018 at 10:34 AM Gourav Sengupta <[hidden email]> wrote:

Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

local-spark-shell.png (2M) Download Attachment
local-spark-shell-error.png (1M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Zhang, Yuqi
In reply to this post by Gourav Sengupta

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Li Gao-2
Yuqi,

Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as the driver client.

-Li


On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 


image001.png (5K) Download Attachment
image001.png (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Zhang, Yuqi

Hi Li,

 

Thank you for your reply.

Do you mean running Jupyter client on k8s cluster to use spark 2.4? Actually I am also trying to set up JupyterHub on k8s to use spark, that’s why I would like to know how to run spark client mode on k8s cluster. If there is any related documentation on how to set up the Jupyter on k8s to use spark, could you please share with me?

 

Thank you for your help!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 0:07
To: "Zhang, Yuqi" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Yuqi,

 

Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as the driver client.

 

-Li

 

 

On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Li Gao-2
Hi Yuqi,

Yes we are running Jupyter Gateway and kernels on k8s and using Spark 2.4's client mode to launch pyspark. In client mode your driver is running on the same pod where your kernel runs. 

I am planning to write some blog post on this on some future date. Did you make the headless service that reflects the driver pod name? Thats one of critical pieces we automated in our custom code that makes the client mode works.

-Li


On Wed, Oct 31, 2018 at 8:13 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Li,

 

Thank you for your reply.

Do you mean running Jupyter client on k8s cluster to use spark 2.4? Actually I am also trying to set up JupyterHub on k8s to use spark, that’s why I would like to know how to run spark client mode on k8s cluster. If there is any related documentation on how to set up the Jupyter on k8s to use spark, could you please share with me?

 

Thank you for your help!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 0:07
To: "Zhang, Yuqi" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Yuqi,

 

Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as the driver client.

 

-Li

 

 

On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 


image001.png (5K) Download Attachment
image001.png (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Zhang, Yuqi

Hi Li,

 

Thank you very much for your reply!

 

> Did you make the headless service that reflects the driver pod name?

I am not sure but I used “app” in the headless service as selector which is the same app name for the StatefulSet that will create the spark driver pod.

For your reference, I attached the yaml file for making headless service and StatefulSet. Could you please help me take a look at it if you have time?

 

I appreciate for your help & have a good day!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 4:56
To: "Zhang, Yuqi" <[hidden email]>
Cc: Gourav Sengupta <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Hi Yuqi,

 

Yes we are running Jupyter Gateway and kernels on k8s and using Spark 2.4's client mode to launch pyspark. In client mode your driver is running on the same pod where your kernel runs. 

 

I am planning to write some blog post on this on some future date. Did you make the headless service that reflects the driver pod name? Thats one of critical pieces we automated in our custom code that makes the client mode works.

 

-Li

 

 

On Wed, Oct 31, 2018 at 8:13 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Li,

 

Thank you for your reply.

Do you mean running Jupyter client on k8s cluster to use spark 2.4? Actually I am also trying to set up JupyterHub on k8s to use spark, that’s why I would like to know how to run spark client mode on k8s cluster. If there is any related documentation on how to set up the Jupyter on k8s to use spark, could you please share with me?

 

Thank you for your help!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 0:07
To: "Zhang, Yuqi" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Yuqi,

 

Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as the driver client.

 

-Li

 

 

On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

spark-driver-on-k8s.yaml (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Holden Karau
If folks are interested, while it's not on Amazon, I've got a live stream of getting client mode with Jupyternotebook to work on GCP/GKE : https://www.youtube.com/watch?v=eMj0Pv1-Nfo&index=3&list=PLRLebp9QyZtZflexn4Yf9xsocrR_aSryx

On Wed, Oct 31, 2018 at 5:55 PM Zhang, Yuqi <[hidden email]> wrote:

Hi Li,

 

Thank you very much for your reply!

 

> Did you make the headless service that reflects the driver pod name?

I am not sure but I used “app” in the headless service as selector which is the same app name for the StatefulSet that will create the spark driver pod.

For your reference, I attached the yaml file for making headless service and StatefulSet. Could you please help me take a look at it if you have time?

 

I appreciate for your help & have a good day!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 4:56
To: "Zhang, Yuqi" <[hidden email]>
Cc: Gourav Sengupta <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Hi Yuqi,

 

Yes we are running Jupyter Gateway and kernels on k8s and using Spark 2.4's client mode to launch pyspark. In client mode your driver is running on the same pod where your kernel runs. 

 

I am planning to write some blog post on this on some future date. Did you make the headless service that reflects the driver pod name? Thats one of critical pieces we automated in our custom code that makes the client mode works.

 

-Li

 

 

On Wed, Oct 31, 2018 at 8:13 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Li,

 

Thank you for your reply.

Do you mean running Jupyter client on k8s cluster to use spark 2.4? Actually I am also trying to set up JupyterHub on k8s to use spark, that’s why I would like to know how to run spark client mode on k8s cluster. If there is any related documentation on how to set up the Jupyter on k8s to use spark, could you please share with me?

 

Thank you for your help!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 0:07
To: "Zhang, Yuqi" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Yuqi,

 

Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as the driver client.

 

-Li

 

 

On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

image001.png (5K) Download Attachment
image001.png (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

Zhang, Yuqi

Hi Holden,

 

Thank you very much for your reply and your tutorial video.

 

I watched your video and have a question regarding the spark driver pod. In your tutorial, are you running the driver pod on your local? I saw in your tutorial set “spark.driver.host” to be “10.142.0.2” and “spark.driver.port” to be “7778”, could you share how to find the host and port for your spark driver?

Because previous I tried spark client mode on my local by using the command

bin/spark-shell \

--master k8s://https://api-mizuho-k8s-cluster-k8-9v79dk-1267211353.ap-northeast-1.elb.amazonaws.com \

--conf spark.kubernetes.container.image=yukizzz/spark:v2.4 \

--conf spark.executor.instances=3

 

And by checking the sparkUI, I saw the value of spark.driver.host is “192.168.1.104” and “spark.driver.port” is 50331. And I tried again with setting “spark.driver.host” and “spark.driver.port “ to be the value “192.168.1.104” and 50331, the spark-shell could start successfully on kubernetes cluster. The complete command is like this:

bin/spark-shell \

--master k8s://https://api-mizuho-k8s-cluster-k8-9v79dk-1267211353.ap-northeast-1.elb.amazonaws.com \

--conf spark.driver.host=192.168.1.104 \

--conf spark.driver.port=50331 \

--conf spark.kubernetes.container.image=yukizzz/spark:v2.4 \

--conf spark.executor.instances=3

 

I didn’t set these two values, so I wonder where does they come from. Do you have any idea about this?

 

 

Another question is after I setup the spark-shell on k8s cluster, when I tried running spark, I will receive error message like “Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources”. I allocated sufficient resource for the executors, so I guess it might related to the failure communication between spark driver and executors, but I am not sure what’s the exact cause it is. I attached the screenshot of the detailed error message and the log of the executor pod, could you please take a look and see if you know the cause?

 

Thank you very much for your help!! Have a good day!

 

 

Spark-shell error message:

 

 

Executor pod log:

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


signature_147554612

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Holden Karau <[hidden email]>
Date: Thursday, November 15, 2018 23:49
To: "Zhang, Yuqi" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

If folks are interested, while it's not on Amazon, I've got a live stream of getting client mode with Jupyternotebook to work on GCP/GKE : https://www.youtube.com/watch?v=eMj0Pv1-Nfo&index=3&list=PLRLebp9QyZtZflexn4Yf9xsocrR_aSryx

 

On Wed, Oct 31, 2018 at 5:55 PM Zhang, Yuqi <[hidden email]> wrote:

Hi Li,

 

Thank you very much for your reply!

 

> Did you make the headless service that reflects the driver pod name?

I am not sure but I used “app” in the headless service as selector which is the same app name for the StatefulSet that will create the spark driver pod.

For your reference, I attached the yaml file for making headless service and StatefulSet. Could you please help me take a look at it if you have time?

 

I appreciate for your help & have a good day!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573




2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 4:56
To: "Zhang, Yuqi" <[hidden email]>
Cc: Gourav Sengupta <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Hi Yuqi,

 

Yes we are running Jupyter Gateway and kernels on k8s and using Spark 2.4's client mode to launch pyspark. In client mode your driver is running on the same pod where your kernel runs. 

 

I am planning to write some blog post on this on some future date. Did you make the headless service that reflects the driver pod name? Thats one of critical pieces we automated in our custom code that makes the client mode works.

 

-Li

 

 

On Wed, Oct 31, 2018 at 8:13 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Li,

 

Thank you for your reply.

Do you mean running Jupyter client on k8s cluster to use spark 2.4? Actually I am also trying to set up JupyterHub on k8s to use spark, that’s why I would like to know how to run spark client mode on k8s cluster. If there is any related documentation on how to set up the Jupyter on k8s to use spark, could you please share with me?

 

Thank you for your help!

 

Best Regards,

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Li Gao <[hidden email]>
Date: Thursday, November 1, 2018 0:07
To: "Zhang, Yuqi" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

Yuqi,

 

Your error seems unrelated to headless service config you need to enable. For headless service you need to create a headless service that matches to your driver pod name exactly in order for spark 2.4 RC to work under client mode. We have this running for a while now using Jupyter kernel as the driver client.

 

-Li

 

 

On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <[hidden email]> wrote:

Hi Gourav,

 

Thank you for your reply.

 

I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances?

I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes…

Since spark only support client mode on k8s from 2.4 version which is not officially released yet, I would like to ask if there is more detailed documentation regarding the way to run spark-shell on k8s cluster?

 

Thank you in advance & best regards!

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 

 

From: Gourav Sengupta <[hidden email]>
Date: Wednesday, October 31, 2018 18:34
To: "Zhang, Yuqi" <[hidden email]>
Cc: user <[hidden email]>, "Nogami, Masatsugu" <[hidden email]>
Subject: Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

 

[External Email]


Just out of curiosity why would you not use Glue (which is Spark on kubernetes) or EMR? 

 

Regards,

Gourav Sengupta

 

On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <[hidden email]> wrote:

Hello guys,

 

I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem regarding using spark 2.4 client mode function on kubernetes cluster, so I would like to ask if there is some solution to my problem.

 

The problem is when I am trying to run spark-shell on kubernetes v1.11.3 cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3.

 

Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on kubernetes cluster. From the documentation on https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md there is only a brief description. I understand it’s not the official released version yet, but If there is some more documentation, could you please share with me?

 

Thank you very much for your help!

 

 

Error msg:

+ env

+ sed 's/[^=]*=\(.*\)/\1/g'

+ sort -t_ -k4 -n

+ grep SPARK_JAVA_OPT_

+ readarray -t SPARK_EXECUTOR_JAVA_OPTS

+ '[' -n '' ']'

+ '[' -n '' ']'

+ PYSPARK_ARGS=

+ '[' -n '' ']'

+ R_ARGS=

+ '[' -n '' ']'

+ '[' '' == 2 ']'

+ '[' '' == 3 ']'

+ case "$SPARK_K8S_CMD" in

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")

+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress= --deploy-mode client

Error: Missing application resource.

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

 

 

--

Yuqi Zhang

Software Engineer 

m: 090-6725-6573


Error! Filename not specified.

2 Chome-2-23-1 Akasaka

Minato, Tokyo 107-0052
teradata.com

This e-mail is from Teradata Corporation and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Thank you.

Please consider the environment before printing.

 

 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 

--

Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9