how to trace sparkDriver context creation for pyspark
I have python jupyter notebook setup to create a spark context by default, andsometimesthese fail with the following error:
18/04/30 18:03:27 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 18/04/30 18:03:27 ERROR SparkContext: Error initializing SparkContext. java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
I have tracked it down to two possible settings that may cause this in spark 2.0.2, client mode, standalone cluster setup, running in kubernetes:
spark.driver.port - we don't set it, so it should be random
spark.ui.port - we set spark.ui.enabled=false so it should not try to bind to this port.
Short story is I do not know which one spark gets confused about, and looking at spark code not clear how spark.ui.port would cause this even if the error message lists it as a possible cause.
Question 1:have you seen this before?
Question 2:how do I trace the spark driver process? It seems that I can only set the sc.logLevel after the spark context is created, but I need to trace before the spark context is created.
I created a log4j.properties file in the spark/conf directory and set it to TRACE but that only gets picked up when I run a Scala jupyter notebook, not when I run a python juypyter notebook, and I haven't been able to find out how to turn the same level of tracing for a spark-driver process started via a python jupyter notebook.
Spark Command: python2.7 ======================================== Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out:http://continuum.io/thanksandhttps://anaconda.org Spark Command: **/usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/local/spark/conf/**:/usr/local/spark/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --name PySparkShell pyspark-shell