[SparkLauncher] -Dspark.master with missing secondary master IP

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SparkLauncher] -Dspark.master with missing secondary master IP

bsikander
This post was updated on .
We recently transitioned from client mode to cluster mode with Spark
Standalone deployment. We are using 2.2.1. We are also using SparkLauncher
to launch the driver.

The problem is that when my Driver is launched the spark.master property
(-Dspark.master) is set to only primary master IP. Something like
"-Dspark.master=spark://<primary-master-ip>:7077" but I am passing the IP
and port of both primary and secondary master to Launcher.

*Check the following output of launcher.*

18/06/27 10:06:08 INFO RestSubmissionClient: Submitting a request to launch
an application in spark://<primary>:6066,<secondary>:6066.
18/06/27 10:06:08 INFO RestSubmissionClient: Submission successfully created
as driver-20180627100608-0012. Polling submission state...
18/06/27 10:06:08 INFO RestSubmissionClient: Submitting a request for the
status of submission driver-20180627100608-0012 in
spark://<primary>:6066,<secondary>:6066.
18/06/27 10:06:08 ERROR RestSubmissionClient: Error: Server responded with
message of unexpected type SubmissionStatusResponse.
18/06/27 10:06:08 INFO RestSubmissionClient: State of driver
driver-20180627100608-0012 is now RUNNING.
18/06/27 10:06:08 INFO RestSubmissionClient: Driver is running on worker
worker-20180627090529-<slave>-44156 at <slave-IP>:44156.
18/06/27 10:06:08 INFO RestSubmissionClient: Server responded with
CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20180627100608-0012",
  "serverSparkVersion" : "2.2.1",
  "submissionId" : "driver-20180627100608-0012",
  "success" : true
}


*Here is the verbose output of my SparkLauncher:*. You can clearly see that both primary and secondary master IPs are passed to launcher.

 18/06/27 10:06:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Parsed arguments:
  master                  spark://<primary>:6066,<secondary>:6066
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            1g
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  <basic driver options like logging>
  supervise               true
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               my.main.class
  primaryResource         file:/path/to/jar/file.jar
  name                    testName
  childArgs               [akka.tcp://MyTestSystem@<IP>:2552
jobManager-8acaf907-8696-4d8d-8127-fcb35ebae9fa random.conf]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:
  (spark.driver.memory,1g)
 
(spark.executor.extraJavaOptions,-Dlog4j.configuration=file:log4j-server.properties)
  (spark.driver.extraJavaOptions,  <basic driver options like logging>)


Running Spark using the REST application submission protocol.
Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/path/to/jar/file.jar
my.main.class
akka.tcp://MyTestSystem@<IP>:2552
jobManager-8acaf907-8696-4d8d-8127-fcb35ebae9fa
random.conf
System properties:
(spark.driver.memory,1g)
(SPARK_SUBMIT,true)
(spark.executor.extraJavaOptions,-Dlog4j.configuration=file:log4j-server.properties)
(spark.driver.supervise,true)
(spark.app.name,testName)
(spark.driver.extraJavaOptions, <random props>)
(spark.jars,file:/path/to/jar/file.jar)
(spark.submit.deployMode,cluster)
(spark.master,spark://<primary>:6066,<secondary>:6066)
Classpath elements:

*A few things that I find strange:*
1- I don't understand the following error
18/06/27 10:06:08 ERROR RestSubmissionClient: Error: Server responded with
message of unexpected type SubmissionStatusResponse.

2- Why Spark is not using both IPs to launch the driver?

Any help or guidance would be much appreciated.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Reply | Threaded
Open this post in threaded view
|

Re: [ClusterMode] -Dspark.master with missing secondary master IP

bsikander
We switched the port from 7077 to 6066 because we were losing 20 seconds each
time we launched a driver. 10 seconds for failing to submit the driver on
<ip>:7077. After losing 20 seconds, it used to fallback to some old way of
driver submitions.

With 6066 we don't lose any time.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [ClusterMode] -Dspark.master with missing secondary master IP

bsikander
I did some further investigation.

If I launch a driver in cluster mode with master IPs like
spark://<primary>:7077,<secondary>:7077, the the driver is launched with
both IPs and -Dspark.master property has both IPs.

But within the logs I see the following, it causes 20 second delay while
launching each driver
18/06/28 08:19:34 INFO RestSubmissionClient: Submitting a request to launch
an application in spark://<primary>:7077,<secondary>:7077.
18/06/28 08:19:44 WARN RestSubmissionClient: Unable to connect to server
spark://<primary>:7077.
18/06/28 08:19:54 WARN RestSubmissionClient: Unable to connect to server
spark://<secondary>:7077.
Warning: Master endpoint spark://<primary>:7077,<secondary>:7077 was not a
REST server. Falling back to legacy submission gateway instead.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

bsikander
In reply to this post by bsikander
Can anyone please help.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

bsikander
In reply to this post by bsikander
This is what my Driver launch command looks like, it only contains 1 master
in -Dspark.master property whereas from Launcher I am passing 2 with port
6066.

Launch Command: "/path/to/java" "-cp" "" "-Xmx1024M"
"-Dspark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j-server.properties"
"-Dspark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -verbose:gc
-XX:+PrintGCTimeStamps -XX:+CMSClassUnloadingEnabled
-XX:MaxDirectMemorySize=512M -XX:+HeapDumpOnOutOfMemoryError
-Djava.net.preferIPv4Stack=true
-Djava.io.tmpdir=/tmp/spark -Dorg.xerial.snappy.tempdir=/tmp/spark
-Dlog4j.configuration=file:log4j-server.properties  "
"-Dspark.submit.deployMode=cluster"
"-Dspark.master=spark://<secondary>:7077"
"-Dspark.driver.supervise=true"
"-Dspark.driver.memory=1g"
"-Dspark.app.name=myClass"
"-Dspark.jars=file:myJar.jar"
"-XX:+UseConcMarkSweepGC" "-verbose:gc" "-XX:+PrintGCTimeStamps"
"-XX:+CMSClassUnloadingEnabled" "-XX:MaxDirectMemorySize=512M"
"-XX:+HeapDumpOnOutOfMemoryError" "-Djava.net.preferIPv4Stack=true"
"org.apache.spark.deploy.worker.DriverWrapper"
"spark://Worker@<workerIP>:36057"
"/path/to/spark/worker_dir/driver-20180629101706-0064/myJar.jar" "myClass"
"arg1" "arg2" "arg3"


*NOTE:
I am running my standalone spark cluster in HA mode using Zookeeper.*

Reason why I want both masters in driver command:
if the driver loses connection to the only master, it kills itself. You can
see in the following logs

2018-06-27 09:09:03,390 INFO appclient-register-master-threadpool-0
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []:
Connecting to master spark://<secondary-master>:7077...
2018-06-27 09:09:23,390 INFO appclient-register-master-threadpool-0
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []:
Connecting to master spark://<secondary-master>:7077...
2018-06-27 09:09:43,392 ERROR appclient-registration-retry-thread
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []:
Application has been killed. Reason: All masters are unresponsive! Giving
up.
2018-06-27 09:09:43,392 WARN JobServer-akka.actor.default-dispatcher-15
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []:
Application ID is not initialized yet.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]