Spark YARN job submission error (code 13)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark YARN job submission error (code 13)

Aakash Basu-2
Hi,

I'm trying to run a program on a cluster using YARN.

YARN is present there along with HADOOP.

Problem I'm running into is as below -

Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528297574594
     final status: FAILED
     tracking URL: http://MasterNode:8088/cluster/app/application_1528296308262_0004
     user: bblite
Exception in thread "main" org.apache.spark.SparkException: Application application_1528296308262_0004 finished with failed status

I checked on the net and most of the stackoverflow problems say, that the users have given .master('local[*]') in the code while invoking the Spark Session and at the same time, giving --master yarn while doing the spark-submit, hence they're getting the error due to conflict.

But, in my case, I've not mentioned any master at all at the code. Just trying to run it on yarn by giving --master yarn while doing the spark-submit. Below is the code spark invoking -

spark = SparkSession\
.builder\
.appName("Temp_Prog")\
.getOrCreate()
Below is the spark-submit -

spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-cores 6 --executor-memory 4G /appdata/codebase/backend/feature_extraction/try_yarn.py

I've tried without --deploy-mode too, still no help.

Thanks,
Aakash.
Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN job submission error (code 13)

Saisai Shao
In Spark on YARN, error code 13 means SparkContext doesn't initialize in time. You can check the yarn application log to get more information.

BTW, did you just write a plain python script without creating SparkContext/SparkSession?

Aakash Basu <[hidden email]> 于2018年6月8日周五 下午4:15写道:
Hi,

I'm trying to run a program on a cluster using YARN.

YARN is present there along with HADOOP.

Problem I'm running into is as below -

Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528297574594
     final status: FAILED
     tracking URL: http://MasterNode:8088/cluster/app/application_1528296308262_0004
     user: bblite
Exception in thread "main" org.apache.spark.SparkException: Application application_1528296308262_0004 finished with failed status

I checked on the net and most of the stackoverflow problems say, that the users have given .master('local[*]') in the code while invoking the Spark Session and at the same time, giving --master yarn while doing the spark-submit, hence they're getting the error due to conflict.

But, in my case, I've not mentioned any master at all at the code. Just trying to run it on yarn by giving --master yarn while doing the spark-submit. Below is the code spark invoking -

spark = SparkSession\
.builder\
.appName("Temp_Prog")\
.getOrCreate()
Below is the spark-submit -

spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-cores 6 --executor-memory 4G /appdata/codebase/backend/feature_extraction/try_yarn.py

I've tried without --deploy-mode too, still no help.

Thanks,
Aakash.
Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN job submission error (code 13)

Aakash Basu-2
Hi,

I fixed that problem by putting all the Spark JARS in spark-archive.zip and putting it in the HDFS (as that problem was happening for that reason) -

But, I'm facing a new issue now, this is the new RPC error I get (Stack-Trace below) -




2018-06-08 14:26:43 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-08 14:26:45 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-06-08 14:26:45 INFO  SparkContext:54 - Submitted application: EndToEnd_FeatureEngineeringPipeline
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 41957.
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-08 14:26:45 INFO  DiskBlockManager:54 - Created local directory at /appdata/spark/tmp/blockmgr-7b035871-a1f7-47ff-aad8-f7a43367836e
2018-06-08 14:26:45 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-08 14:26:45 INFO  log:192 - Logging initialized @3659ms
2018-06-08 14:26:45 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-08 14:26:45 INFO  Server:414 - Started @3733ms
2018-06-08 14:26:45 INFO  AbstractConnector:278 - Started ServerConnector@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c3409b5{/jobs,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f1ba569{/jobs/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@493631a1{/jobs/job,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b12f33c{/jobs/job/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@490023da{/stages,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c3a862{/stages/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4da2454f{/stages/stage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@552f182d{/stages/stage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@a78a7fa{/stages/pool,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15142105{/stages/pool/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7589c977{/storage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@584a599b{/storage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1742621f{/storage/rdd,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ea75fb{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1813d280{/environment,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@129fc698{/environment/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c91c4e{/executors,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@667ce6c1{/executors/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60fdbf5c{/executors/threadDump,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c3a1edd{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@52cf5878{/static,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b7c7cff{/,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7691ad8{/api,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bb96483{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24a994f7{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://:4040
2018-06-08 14:26:46 INFO  RMProxy:98 - Connecting to ResourceManager at /192.168.49.37:8032
2018-06-08 14:26:46 INFO  Client:54 - Requesting a new application from cluster with 4 NodeManagers
2018-06-08 14:26:46 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-06-08 14:26:46 INFO  Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2018-06-08 14:26:46 INFO  Client:54 - Setting up container launch context for our AM
2018-06-08 14:26:46 INFO  Client:54 - Setting up the launch environment for our AM container
2018-06-08 14:26:46 INFO  Client:54 - Preparing resources for our AM container
2018-06-08 14:26:48 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs:/spark-jars.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/pyspark.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/py4j-0.10.6-src.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3-f3ef0926d3ab/__spark_conf__4300362365336835927.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/__spark_conf__.zip
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:48 INFO  Client:54 - Submitting application application_1528296308262_0017 to ResourceManager
2018-06-08 14:26:48 INFO  YarnClientImpl:273 - Submitted application application_1528296308262_0017
2018-06-08 14:26:48 INFO  SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1528296308262_0017 and attemptId None
2018-06-08 14:26:49 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:49 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:50 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:51 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 WARN  TransportChannelHandler:78 - Exception in connection from /192.168.49.38:38862
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:53 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:54 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:55 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> MasterNode, PROXY_URI_BASES -> http://MasterNode:8088/proxy/application_1528296308262_0017), /proxy/application_1528296308262_0017
2018-06-08 14:26:56 INFO  JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2018-06-08 14:26:57 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2018-06-08 14:26:57 INFO  Client:54 - Application report for application_1528296308262_0017 (state: RUNNING)
2018-06-08 14:26:57 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.49.39
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:57 INFO  YarnClientSchedulerBackend:54 - Application application_1528296308262_0017 has started running.
2018-06-08 14:26:57 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45193.
2018-06-08 14:26:57 INFO  NettyBlockTransferService:54 - Server created on MasterNode:45193
2018-06-08 14:26:57 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMasterEndpoint:54 - Registering block manager MasterNode:45193 with 366.3 MB RAM, BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@261e16df{/metrics/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:59 ERROR YarnClientSchedulerBackend:70 - Yarn application has already exited with state FINISHED!
2018-06-08 14:26:59 INFO  AbstractConnector:318 - Stopped Spark@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:59 INFO  SparkUI:54 - Stopped Spark web UI at http://:4040
2018-06-08 14:26:59 ERROR TransportClient:233 - Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
2018-06-08 14:26:59 ERROR Utils:91 - Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:566)
    at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1752)
    at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1924)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1923)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
Caused by: java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-06-08 14:26:59 INFO  MemoryStore:54 - MemoryStore cleared
2018-06-08 14:26:59 INFO  BlockManager:54 - BlockManager stopped
2018-06-08 14:26:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:59 INFO  SparkContext:54 - SparkContext already stopped.
Traceback (most recent call last):
  File "/appdata/bblite-codebase/automl/backend/feature_extraction/trigger_feature_engineering_pipeline.py", line 18, in <module>
    .appName("EndToEnd_FeatureEngineeringPipeline")\
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 331, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 270, in _initialize_context
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

2018-06-08 14:26:59 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-06-08 14:26:59 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-06-08 14:26:59 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3 -f3ef0926d3ab
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1b471b46-0c5a-4f75-94c1-c99d9d674228

Seems the name-node and data-nodes cannot talk to each other correctly, why, no clue, anyone faced this problem, any help on this?

Thanks,
Aakash.

On Fri, Jun 8, 2018 at 2:17 PM, Saisai Shao <[hidden email]> wrote:
In Spark on YARN, error code 13 means SparkContext doesn't initialize in time. You can check the yarn application log to get more information.

BTW, did you just write a plain python script without creating SparkContext/SparkSession?

Aakash Basu <[hidden email]> 于2018年6月8日周五 下午4:15写道:
Hi,

I'm trying to run a program on a cluster using YARN.

YARN is present there along with HADOOP.

Problem I'm running into is as below -

Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528297574594
     final status: FAILED
     tracking URL: http://MasterNode:8088/cluster/app/application_1528296308262_0004
     user: bblite
Exception in thread "main" org.apache.spark.SparkException: Application application_1528296308262_0004 finished with failed status

I checked on the net and most of the stackoverflow problems say, that the users have given .master('local[*]') in the code while invoking the Spark Session and at the same time, giving --master yarn while doing the spark-submit, hence they're getting the error due to conflict.

But, in my case, I've not mentioned any master at all at the code. Just trying to run it on yarn by giving --master yarn while doing the spark-submit. Below is the code spark invoking -

spark = SparkSession\
.builder\
.appName("Temp_Prog")\
.getOrCreate()
Below is the spark-submit -

spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-cores 6 --executor-memory 4G /appdata/codebase/backend/feature_extraction/try_yarn.py

I've tried without --deploy-mode too, still no help.

Thanks,
Aakash.

Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN job submission error (code 13)

Aakash Basu-2
Fixed by adding 2 configurations in yarn-site,xml.

Thanks all!

On Fri, Jun 8, 2018 at 2:44 PM, Aakash Basu <[hidden email]> wrote:
Hi,

I fixed that problem by putting all the Spark JARS in spark-archive.zip and putting it in the HDFS (as that problem was happening for that reason) -

But, I'm facing a new issue now, this is the new RPC error I get (Stack-Trace below) -




2018-06-08 14:26:43 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-08 14:26:45 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-06-08 14:26:45 INFO  SparkContext:54 - Submitted application: EndToEnd_FeatureEngineeringPipeline
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 41957.
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-08 14:26:45 INFO  DiskBlockManager:54 - Created local directory at /appdata/spark/tmp/blockmgr-7b035871-a1f7-47ff-aad8-f7a43367836e
2018-06-08 14:26:45 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-08 14:26:45 INFO  log:192 - Logging initialized @3659ms
2018-06-08 14:26:45 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-08 14:26:45 INFO  Server:414 - Started @3733ms
2018-06-08 14:26:45 INFO  AbstractConnector:278 - Started ServerConnector@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c3409b5{/jobs,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f1ba569{/jobs/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@493631a1{/jobs/job,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b12f33c{/jobs/job/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@490023da{/stages,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c3a862{/stages/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4da2454f{/stages/stage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@552f182d{/stages/stage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@a78a7fa{/stages/pool,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15142105{/stages/pool/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7589c977{/storage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@584a599b{/storage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1742621f{/storage/rdd,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ea75fb{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1813d280{/environment,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@129fc698{/environment/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c91c4e{/executors,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@667ce6c1{/executors/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60fdbf5c{/executors/threadDump,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c3a1edd{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@52cf5878{/static,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b7c7cff{/,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7691ad8{/api,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bb96483{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24a994f7{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://:4040
2018-06-08 14:26:46 INFO  RMProxy:98 - Connecting to ResourceManager at /192.168.49.37:8032
2018-06-08 14:26:46 INFO  Client:54 - Requesting a new application from cluster with 4 NodeManagers
2018-06-08 14:26:46 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-06-08 14:26:46 INFO  Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2018-06-08 14:26:46 INFO  Client:54 - Setting up container launch context for our AM
2018-06-08 14:26:46 INFO  Client:54 - Setting up the launch environment for our AM container
2018-06-08 14:26:46 INFO  Client:54 - Preparing resources for our AM container
2018-06-08 14:26:48 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs:/spark-jars.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/pyspark.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/py4j-0.10.6-src.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3-f3ef0926d3ab/__spark_conf__4300362365336835927.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/__spark_conf__.zip
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:48 INFO  Client:54 - Submitting application application_1528296308262_0017 to ResourceManager
2018-06-08 14:26:48 INFO  YarnClientImpl:273 - Submitted application application_1528296308262_0017
2018-06-08 14:26:48 INFO  SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1528296308262_0017 and attemptId None
2018-06-08 14:26:49 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:49 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:50 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:51 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 WARN  TransportChannelHandler:78 - Exception in connection from /192.168.49.38:38862
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:53 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:54 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:55 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> MasterNode, PROXY_URI_BASES -> http://MasterNode:8088/proxy/application_1528296308262_0017), /proxy/application_1528296308262_0017
2018-06-08 14:26:56 INFO  JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2018-06-08 14:26:57 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2018-06-08 14:26:57 INFO  Client:54 - Application report for application_1528296308262_0017 (state: RUNNING)
2018-06-08 14:26:57 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.49.39
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:57 INFO  YarnClientSchedulerBackend:54 - Application application_1528296308262_0017 has started running.
2018-06-08 14:26:57 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45193.
2018-06-08 14:26:57 INFO  NettyBlockTransferService:54 - Server created on MasterNode:45193
2018-06-08 14:26:57 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMasterEndpoint:54 - Registering block manager MasterNode:45193 with 366.3 MB RAM, BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@261e16df{/metrics/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:59 ERROR YarnClientSchedulerBackend:70 - Yarn application has already exited with state FINISHED!
2018-06-08 14:26:59 INFO  AbstractConnector:318 - Stopped Spark@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:59 INFO  SparkUI:54 - Stopped Spark web UI at http://:4040
2018-06-08 14:26:59 ERROR TransportClient:233 - Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
2018-06-08 14:26:59 ERROR Utils:91 - Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:566)
    at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1752)
    at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1924)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1923)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
Caused by: java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-06-08 14:26:59 INFO  MemoryStore:54 - MemoryStore cleared
2018-06-08 14:26:59 INFO  BlockManager:54 - BlockManager stopped
2018-06-08 14:26:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:59 INFO  SparkContext:54 - SparkContext already stopped.
Traceback (most recent call last):
  File "/appdata/bblite-codebase/automl/backend/feature_extraction/trigger_feature_engineering_pipeline.py", line 18, in <module>
    .appName("EndToEnd_FeatureEngineeringPipeline")\
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 331, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 270, in _initialize_context
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

2018-06-08 14:26:59 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-06-08 14:26:59 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-06-08 14:26:59 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3 -f3ef0926d3ab
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1b471b46-0c5a-4f75-94c1-c99d9d674228

Seems the name-node and data-nodes cannot talk to each other correctly, why, no clue, anyone faced this problem, any help on this?

Thanks,
Aakash.

On Fri, Jun 8, 2018 at 2:17 PM, Saisai Shao <[hidden email]> wrote:
In Spark on YARN, error code 13 means SparkContext doesn't initialize in time. You can check the yarn application log to get more information.

BTW, did you just write a plain python script without creating SparkContext/SparkSession?

Aakash Basu <[hidden email]> 于2018年6月8日周五 下午4:15写道:
Hi,

I'm trying to run a program on a cluster using YARN.

YARN is present there along with HADOOP.

Problem I'm running into is as below -

Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528297574594
     final status: FAILED
     tracking URL: http://MasterNode:8088/cluster/app/application_1528296308262_0004
     user: bblite
Exception in thread "main" org.apache.spark.SparkException: Application application_1528296308262_0004 finished with failed status

I checked on the net and most of the stackoverflow problems say, that the users have given .master('local[*]') in the code while invoking the Spark Session and at the same time, giving --master yarn while doing the spark-submit, hence they're getting the error due to conflict.

But, in my case, I've not mentioned any master at all at the code. Just trying to run it on yarn by giving --master yarn while doing the spark-submit. Below is the code spark invoking -

spark = SparkSession\
.builder\
.appName("Temp_Prog")\
.getOrCreate()
Below is the spark-submit -

spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-cores 6 --executor-memory 4G /appdata/codebase/backend/feature_extraction/try_yarn.py

I've tried without --deploy-mode too, still no help.

Thanks,
Aakash.