Spark 2.3 submit on Kubernetes error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 2.3 submit on Kubernetes error

purna pradeep
Getting below errors when I’m trying to run spark-submit on k8 cluster
 
 
Error 1:This looks like a warning it doesn’t interrupt the app running inside executor pod but keeps on getting this warning
 

    2018-03-09 11:15:21 WARN  WatchConnectionManager:192 - Exec Failure
    java.io.EOFException
           at okio.RealBufferedSource.require(RealBufferedSource.java:60)
           at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
           at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
           at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
           at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
           at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
           at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
           at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
     


Error2: This is intermittent error  which is failing the executor pod to run 
 

    org.apache.spark.SparkException: External scheduler cannot be instantiated
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
     at scala.Option.getOrElse(Option.scala:121)
     at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
     at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
     at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
     at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
     at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
     at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
    Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver-driver]  in namespace: [default]  failed.
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
     ... 11 more
    Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
     at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
     at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
     at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
     at java.net.InetAddress.getAllByName(InetAddress.java:1192)
     at java.net.InetAddress.getAllByName(InetAddress.java:1126)
     at okhttp3.Dns$1.lookup(Dns.java:39)
     at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
     at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
     at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
     at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
     at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
     at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
     at okhttp3.RealCall.execute(RealCall.java:69)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
     ... 15 more
    2018-03-09 15:00:39 INFO  AbstractConnector:318 - Stopped Spark@5f59185e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    2018-03-09 15:00:39 INFO  SparkUI:54 - Stopped Spark web UI at http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040
    2018-03-09 15:00:39 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
    2018-03-09 15:00:39 INFO  MemoryStore:54 - MemoryStore cleared
    2018-03-09 15:00:39 INFO  BlockManager:54 - BlockManager stopped
    2018-03-09 15:00:39 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
    2018-03-09 15:00:39 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
    2018-03-09 15:00:39 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
    2018-03-09 15:00:39 INFO  SparkContext:54 - Successfully stopped SparkContext
    Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
     at scala.Option.getOrElse(Option.scala:121)
     at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
     at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
     at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
     at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
     at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
     at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
    Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver]  in namespace: [default]  failed.
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
     ... 11 more
    Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
     at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
     at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
     at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
     at java.net.InetAddress.getAllByName(InetAddress.java:1192)
     at java.net.InetAddress.getAllByName(InetAddress.java:1126)
     at okhttp3.Dns$1.lookup(Dns.java:39)
     at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
     at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
     at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
     at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
     at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
     at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
     at okhttp3.RealCall.execute(RealCall.java:69)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
     ... 15 more
    2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Shutdown hook called
    2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bd85c96-d689-4c53-a0b3-1eadd32357cb

 
Note:Able to run the application successfully but spark-submit run fails  with above error2 very frequently.
 
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.3 submit on Kubernetes error

Yinan Li
Spark on Kubernetes requires the presence of the kube-dns add-on properly configured. The executors connect to the driver through a headless Kubernetes service using the DNS name of the service. Can you check if you have the add-on installed in your cluster? This issue https://github.com/apache-spark-on-k8s/spark/issues/558 might help.


On Sun, Mar 11, 2018 at 5:01 PM, purna pradeep <[hidden email]> wrote:
Getting below errors when I’m trying to run spark-submit on k8 cluster
 
 
Error 1:This looks like a warning it doesn’t interrupt the app running inside executor pod but keeps on getting this warning
 

    2018-03-09 11:15:21 WARN  WatchConnectionManager:192 - Exec Failure
    java.io.EOFException
           at okio.RealBufferedSource.require(RealBufferedSource.java:60)
           at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
           at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
           at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
           at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
           at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
           at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
           at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
     


Error2: This is intermittent error  which is failing the executor pod to run 
 

    org.apache.spark.SparkException: External scheduler cannot be instantiated
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
     at scala.Option.getOrElse(Option.scala:121)
     at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
     at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
     at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
     at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
     at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
     at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
    Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver-driver]  in namespace: [default]  failed.
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
     ... 11 more
    Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
     at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
     at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
     at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
     at java.net.InetAddress.getAllByName(InetAddress.java:1192)
     at java.net.InetAddress.getAllByName(InetAddress.java:1126)
     at okhttp3.Dns$1.lookup(Dns.java:39)
     at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
     at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
     at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
     at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
     at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
     at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
     at okhttp3.RealCall.execute(RealCall.java:69)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
     ... 15 more
    2018-03-09 15:00:39 INFO  AbstractConnector:318 - Stopped Spark@5f59185e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    2018-03-09 15:00:39 INFO  SparkUI:54 - Stopped Spark web UI at http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040
    2018-03-09 15:00:39 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
    2018-03-09 15:00:39 INFO  MemoryStore:54 - MemoryStore cleared
    2018-03-09 15:00:39 INFO  BlockManager:54 - BlockManager stopped
    2018-03-09 15:00:39 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
    2018-03-09 15:00:39 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
    2018-03-09 15:00:39 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
    2018-03-09 15:00:39 INFO  SparkContext:54 - Successfully stopped SparkContext
    Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
     at scala.Option.getOrElse(Option.scala:121)
     at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
     at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
     at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
     at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
     at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
     at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
    Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver]  in namespace: [default]  failed.
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
     ... 11 more
    Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
     at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
     at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
     at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
     at java.net.InetAddress.getAllByName(InetAddress.java:1192)
     at java.net.InetAddress.getAllByName(InetAddress.java:1126)
     at okhttp3.Dns$1.lookup(Dns.java:39)
     at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
     at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
     at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
     at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
     at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
     at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
     at okhttp3.RealCall.execute(RealCall.java:69)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
     ... 15 more
    2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Shutdown hook called
    2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bd85c96-d689-4c53-a0b3-1eadd32357cb

 
Note:Able to run the application successfully but spark-submit run fails  with above error2 very frequently.
 

Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.3 submit on Kubernetes error

purna pradeep
Thanks Yinan,

I’m able to get kube-dns endpoints when I ran this command

kubectl get ep kube-dns —namespace=kube-system

Do I need to deploy under kube-system instead of default namespace

And please lemme know if you have any insights on Error1 ?

On Sun, Mar 11, 2018 at 8:26 PM Yinan Li <[hidden email]> wrote:
Spark on Kubernetes requires the presence of the kube-dns add-on properly configured. The executors connect to the driver through a headless Kubernetes service using the DNS name of the service. Can you check if you have the add-on installed in your cluster? This issue https://github.com/apache-spark-on-k8s/spark/issues/558 might help.


On Sun, Mar 11, 2018 at 5:01 PM, purna pradeep <[hidden email]> wrote:
Getting below errors when I’m trying to run spark-submit on k8 cluster
 
 
Error 1:This looks like a warning it doesn’t interrupt the app running inside executor pod but keeps on getting this warning
 

    2018-03-09 11:15:21 WARN  WatchConnectionManager:192 - Exec Failure
    java.io.EOFException
           at okio.RealBufferedSource.require(RealBufferedSource.java:60)
           at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
           at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
           at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
           at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
           at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
           at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
           at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
     


Error2: This is intermittent error  which is failing the executor pod to run 
 

    org.apache.spark.SparkException: External scheduler cannot be instantiated
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
     at scala.Option.getOrElse(Option.scala:121)
     at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
     at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
     at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
     at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
     at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
     at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
    Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver-driver]  in namespace: [default]  failed.
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
     ... 11 more
    Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
     at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
     at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
     at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
     at java.net.InetAddress.getAllByName(InetAddress.java:1192)
     at java.net.InetAddress.getAllByName(InetAddress.java:1126)
     at okhttp3.Dns$1.lookup(Dns.java:39)
     at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
     at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
     at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
     at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
     at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
     at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
     at okhttp3.RealCall.execute(RealCall.java:69)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
     ... 15 more
    2018-03-09 15:00:39 INFO  AbstractConnector:318 - Stopped Spark@5f59185e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    2018-03-09 15:00:39 INFO  SparkUI:54 - Stopped Spark web UI at http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040
    2018-03-09 15:00:39 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
    2018-03-09 15:00:39 INFO  MemoryStore:54 - MemoryStore cleared
    2018-03-09 15:00:39 INFO  BlockManager:54 - BlockManager stopped
    2018-03-09 15:00:39 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
    2018-03-09 15:00:39 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
    2018-03-09 15:00:39 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
    2018-03-09 15:00:39 INFO  SparkContext:54 - Successfully stopped SparkContext
    Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
     at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
     at scala.Option.getOrElse(Option.scala:121)
     at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
     at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
     at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
     at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
     at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
     at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
    Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver]  in namespace: [default]  failed.
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
     at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
     at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
     at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
     ... 11 more
    Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
     at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
     at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
     at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
     at java.net.InetAddress.getAllByName(InetAddress.java:1192)
     at java.net.InetAddress.getAllByName(InetAddress.java:1126)
     at okhttp3.Dns$1.lookup(Dns.java:39)
     at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
     at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
     at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
     at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
     at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
     at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
     at okhttp3.RealCall.execute(RealCall.java:69)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
     at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
     at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
     ... 15 more
    2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Shutdown hook called
    2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bd85c96-d689-4c53-a0b3-1eadd32357cb

 
Note:Able to run the application successfully but spark-submit run fails  with above error2 very frequently.