Spark YARN Error - triggering spark-shell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark YARN Error - triggering spark-shell

Aakash Basu-2
Hi,

Getting this error when trying to run Spark Shell using YARN -

Command: spark-shell --master yarn --deploy-mode client

2018-06-08 13:39:09 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-06-08 13:39:25 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

The last half of stack-trace -

2018-06-08 13:56:11 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2018-06-08 13:56:11 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

Tried putting the spark-yarn_2.11-2.3.0.jar in Hadoop yarn, still not working, anything else to do?

Thanks,
Aakash.
Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN Error - triggering spark-shell

zjffdu

Check the yarn AM log for details. 



Aakash Basu <[hidden email]>于2018年6月8日周五 下午4:36写道:
Hi,

Getting this error when trying to run Spark Shell using YARN -

Command: spark-shell --master yarn --deploy-mode client

2018-06-08 13:39:09 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-06-08 13:39:25 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

The last half of stack-trace -

2018-06-08 13:56:11 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2018-06-08 13:56:11 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

Tried putting the spark-yarn_2.11-2.3.0.jar in Hadoop yarn, still not working, anything else to do?

Thanks,
Aakash.
Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN Error - triggering spark-shell

Sathishkumar Manimoorthy
It seems, your spark-on-yarn application is not able to get it's application master,

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

Check once on yarn logs

Thanks,
Sathish-


On Fri, Jun 8, 2018 at 2:22 PM, Jeff Zhang <[hidden email]> wrote:

Check the yarn AM log for details. 



Aakash Basu <[hidden email]>于2018年6月8日周五 下午4:36写道:
Hi,

Getting this error when trying to run Spark Shell using YARN -

Command: spark-shell --master yarn --deploy-mode client

2018-06-08 13:39:09 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-06-08 13:39:25 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

The last half of stack-trace -

2018-06-08 13:56:11 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2018-06-08 13:56:11 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

Tried putting the spark-yarn_2.11-2.3.0.jar in Hadoop yarn, still not working, anything else to do?

Thanks,
Aakash.

Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN Error - triggering spark-shell

Aakash Basu-2
Hi,

I fixed that problem by putting all the Spark JARS in spark-archive.zip and putting it in the HDFS (as that problem was happening for that reason) -

But, I'm facing a new issue now, this is the new RPC error I get (Stack-Trace below) -




2018-06-08 14:26:43 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-08 14:26:45 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-06-08 14:26:45 INFO  SparkContext:54 - Submitted application: EndToEnd_FeatureEngineeringPipeline
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 41957.
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-08 14:26:45 INFO  DiskBlockManager:54 - Created local directory at /appdata/spark/tmp/blockmgr-7b035871-a1f7-47ff-aad8-f7a43367836e
2018-06-08 14:26:45 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-08 14:26:45 INFO  log:192 - Logging initialized @3659ms
2018-06-08 14:26:45 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-08 14:26:45 INFO  Server:414 - Started @3733ms
2018-06-08 14:26:45 INFO  AbstractConnector:278 - Started ServerConnector@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c3409b5{/jobs,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f1ba569{/jobs/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@493631a1{/jobs/job,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b12f33c{/jobs/job/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@490023da{/stages,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c3a862{/stages/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4da2454f{/stages/stage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@552f182d{/stages/stage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@a78a7fa{/stages/pool,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15142105{/stages/pool/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7589c977{/storage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@584a599b{/storage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1742621f{/storage/rdd,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ea75fb{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1813d280{/environment,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@129fc698{/environment/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c91c4e{/executors,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@667ce6c1{/executors/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60fdbf5c{/executors/threadDump,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c3a1edd{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@52cf5878{/static,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b7c7cff{/,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7691ad8{/api,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bb96483{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24a994f7{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://:4040
2018-06-08 14:26:46 INFO  RMProxy:98 - Connecting to ResourceManager at /192.168.49.37:8032
2018-06-08 14:26:46 INFO  Client:54 - Requesting a new application from cluster with 4 NodeManagers
2018-06-08 14:26:46 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-06-08 14:26:46 INFO  Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2018-06-08 14:26:46 INFO  Client:54 - Setting up container launch context for our AM
2018-06-08 14:26:46 INFO  Client:54 - Setting up the launch environment for our AM container
2018-06-08 14:26:46 INFO  Client:54 - Preparing resources for our AM container
2018-06-08 14:26:48 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs:/spark-jars.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/pyspark.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/py4j-0.10.6-src.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3-f3ef0926d3ab/__spark_conf__4300362365336835927.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/__spark_conf__.zip
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:48 INFO  Client:54 - Submitting application application_1528296308262_0017 to ResourceManager
2018-06-08 14:26:48 INFO  YarnClientImpl:273 - Submitted application application_1528296308262_0017
2018-06-08 14:26:48 INFO  SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1528296308262_0017 and attemptId None
2018-06-08 14:26:49 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:49 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:50 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:51 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 WARN  TransportChannelHandler:78 - Exception in connection from /192.168.49.38:38862
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:53 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:54 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:55 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> MasterNode, PROXY_URI_BASES -> http://MasterNode:8088/proxy/application_1528296308262_0017), /proxy/application_1528296308262_0017
2018-06-08 14:26:56 INFO  JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2018-06-08 14:26:57 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2018-06-08 14:26:57 INFO  Client:54 - Application report for application_1528296308262_0017 (state: RUNNING)
2018-06-08 14:26:57 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.49.39
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:57 INFO  YarnClientSchedulerBackend:54 - Application application_1528296308262_0017 has started running.
2018-06-08 14:26:57 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45193.
2018-06-08 14:26:57 INFO  NettyBlockTransferService:54 - Server created on MasterNode:45193
2018-06-08 14:26:57 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMasterEndpoint:54 - Registering block manager MasterNode:45193 with 366.3 MB RAM, BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@261e16df{/metrics/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:59 ERROR YarnClientSchedulerBackend:70 - Yarn application has already exited with state FINISHED!
2018-06-08 14:26:59 INFO  AbstractConnector:318 - Stopped Spark@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:59 INFO  SparkUI:54 - Stopped Spark web UI at http://:4040
2018-06-08 14:26:59 ERROR TransportClient:233 - Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
2018-06-08 14:26:59 ERROR Utils:91 - Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:566)
    at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1752)
    at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1924)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1923)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
Caused by: java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-06-08 14:26:59 INFO  MemoryStore:54 - MemoryStore cleared
2018-06-08 14:26:59 INFO  BlockManager:54 - BlockManager stopped
2018-06-08 14:26:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:59 INFO  SparkContext:54 - SparkContext already stopped.
Traceback (most recent call last):
  File "/appdata/bblite-codebase/automl/backend/feature_extraction/trigger_feature_engineering_pipeline.py", line 18, in <module>
    .appName("EndToEnd_FeatureEngineeringPipeline")\
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 331, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 270, in _initialize_context
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

2018-06-08 14:26:59 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-06-08 14:26:59 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-06-08 14:26:59 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3 -f3ef0926d3ab
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1b471b46-0c5a-4f75-94c1-c99d9d674228

Seems the name-node and data-nodes cannot talk to each other correctly, why, no clue, anyone faced this problem, any help on this?

Thanks,
Aakash.


On Fri, Jun 8, 2018 at 2:31 PM, Sathishkumar Manimoorthy <[hidden email]> wrote:
It seems, your spark-on-yarn application is not able to get it's application master,

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

Check once on yarn logs

Thanks,
Sathish-


On Fri, Jun 8, 2018 at 2:22 PM, Jeff Zhang <[hidden email]> wrote:

Check the yarn AM log for details. 



Aakash Basu <[hidden email]>于2018年6月8日周五 下午4:36写道:
Hi,

Getting this error when trying to run Spark Shell using YARN -

Command: spark-shell --master yarn --deploy-mode client

2018-06-08 13:39:09 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-06-08 13:39:25 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

The last half of stack-trace -

2018-06-08 13:56:11 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2018-06-08 13:56:11 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

Tried putting the spark-yarn_2.11-2.3.0.jar in Hadoop yarn, still not working, anything else to do?

Thanks,
Aakash.


Reply | Threaded
Open this post in threaded view
|

Re: Spark YARN Error - triggering spark-shell

Aakash Basu-2
Fixed by adding 2 configurations in yarn-site,xml.

Thanks all!


On Fri, Jun 8, 2018 at 2:44 PM, Aakash Basu <[hidden email]> wrote:
Hi,

I fixed that problem by putting all the Spark JARS in spark-archive.zip and putting it in the HDFS (as that problem was happening for that reason) -

But, I'm facing a new issue now, this is the new RPC error I get (Stack-Trace below) -




2018-06-08 14:26:43 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-08 14:26:45 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-06-08 14:26:45 INFO  SparkContext:54 - Submitted application: EndToEnd_FeatureEngineeringPipeline
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:45 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 41957.
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-08 14:26:45 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-08 14:26:45 INFO  DiskBlockManager:54 - Created local directory at /appdata/spark/tmp/blockmgr-7b035871-a1f7-47ff-aad8-f7a43367836e
2018-06-08 14:26:45 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-08 14:26:45 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-08 14:26:45 INFO  log:192 - Logging initialized @3659ms
2018-06-08 14:26:45 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-08 14:26:45 INFO  Server:414 - Started @3733ms
2018-06-08 14:26:45 INFO  AbstractConnector:278 - Started ServerConnector@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:45 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c3409b5{/jobs,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f1ba569{/jobs/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@493631a1{/jobs/job,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b12f33c{/jobs/job/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@490023da{/stages,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c3a862{/stages/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4da2454f{/stages/stage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@552f182d{/stages/stage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@a78a7fa{/stages/pool,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15142105{/stages/pool/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7589c977{/storage,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@584a599b{/storage/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1742621f{/storage/rdd,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ea75fb{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1813d280{/environment,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@129fc698{/environment/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c91c4e{/executors,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@667ce6c1{/executors/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60fdbf5c{/executors/threadDump,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c3a1edd{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@52cf5878{/static,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b7c7cff{/,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7691ad8{/api,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bb96483{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24a994f7{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-06-08 14:26:45 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://:4040
2018-06-08 14:26:46 INFO  RMProxy:98 - Connecting to ResourceManager at /192.168.49.37:8032
2018-06-08 14:26:46 INFO  Client:54 - Requesting a new application from cluster with 4 NodeManagers
2018-06-08 14:26:46 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-06-08 14:26:46 INFO  Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2018-06-08 14:26:46 INFO  Client:54 - Setting up container launch context for our AM
2018-06-08 14:26:46 INFO  Client:54 - Setting up the launch environment for our AM container
2018-06-08 14:26:46 INFO  Client:54 - Preparing resources for our AM container
2018-06-08 14:26:48 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs:/spark-jars.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/pyspark.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/py4j-0.10.6-src.zip
2018-06-08 14:26:48 INFO  Client:54 - Uploading resource file:/appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3-f3ef0926d3ab/__spark_conf__4300362365336835927.zip -> hdfs://192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/__spark_conf__.zip
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls to: bblite
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-06-08 14:26:48 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bblite); groups with view permissions: Set(); users  with modify permissions: Set(bblite); groups with modify permissions: Set()
2018-06-08 14:26:48 INFO  Client:54 - Submitting application application_1528296308262_0017 to ResourceManager
2018-06-08 14:26:48 INFO  YarnClientImpl:273 - Submitted application application_1528296308262_0017
2018-06-08 14:26:48 INFO  SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1528296308262_0017 and attemptId None
2018-06-08 14:26:49 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:49 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:50 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:51 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:52 WARN  TransportChannelHandler:78 - Exception in connection from /192.168.49.38:38862
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:53 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:54 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:55 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED)
2018-06-08 14:26:56 INFO  YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> MasterNode, PROXY_URI_BASES -> http://MasterNode:8088/proxy/application_1528296308262_0017), /proxy/application_1528296308262_0017
2018-06-08 14:26:56 INFO  JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2018-06-08 14:26:57 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2018-06-08 14:26:57 INFO  Client:54 - Application report for application_1528296308262_0017 (state: RUNNING)
2018-06-08 14:26:57 INFO  Client:54 -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.49.39
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1528448208475
     final status: UNDEFINED
     tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/
     user: bblite
2018-06-08 14:26:57 INFO  YarnClientSchedulerBackend:54 - Application application_1528296308262_0017 has started running.
2018-06-08 14:26:57 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45193.
2018-06-08 14:26:57 INFO  NettyBlockTransferService:54 - Server created on MasterNode:45193
2018-06-08 14:26:57 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMasterEndpoint:54 - Registering block manager MasterNode:45193 with 366.3 MB RAM, BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, MasterNode, 45193, None)
2018-06-08 14:26:57 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@261e16df{/metrics/json,null,AVAILABLE,@Spark}
2018-06-08 14:26:59 ERROR YarnClientSchedulerBackend:70 - Yarn application has already exited with state FINISHED!
2018-06-08 14:26:59 INFO  AbstractConnector:318 - Stopped Spark@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-08 14:26:59 INFO  SparkUI:54 - Stopped Spark web UI at http://:4040
2018-06-08 14:26:59 ERROR TransportClient:233 - Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
2018-06-08 14:26:59 ERROR Utils:91 - Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:566)
    at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1752)
    at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1924)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1923)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
Caused by: java.io.IOException: Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2018-06-08 14:26:59 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-06-08 14:26:59 INFO  MemoryStore:54 - MemoryStore cleared
2018-06-08 14:26:59 INFO  BlockManager:54 - BlockManager stopped
2018-06-08 14:26:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
2018-06-08 14:26:59 INFO  SparkContext:54 - SparkContext already stopped.
Traceback (most recent call last):
  File "/appdata/bblite-codebase/automl/backend/feature_extraction/trigger_feature_engineering_pipeline.py", line 18, in <module>
    .appName("EndToEnd_FeatureEngineeringPipeline")\
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 331, in getOrCreate
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 270, in _initialize_context
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__
  File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

2018-06-08 14:26:59 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-06-08 14:26:59 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-06-08 14:26:59 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3 -f3ef0926d3ab
2018-06-08 14:26:59 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1b471b46-0c5a-4f75-94c1-c99d9d674228

Seems the name-node and data-nodes cannot talk to each other correctly, why, no clue, anyone faced this problem, any help on this?

Thanks,
Aakash.


On Fri, Jun 8, 2018 at 2:31 PM, Sathishkumar Manimoorthy <[hidden email]> wrote:
It seems, your spark-on-yarn application is not able to get it's application master,

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

Check once on yarn logs

Thanks,
Sathish-


On Fri, Jun 8, 2018 at 2:22 PM, Jeff Zhang <[hidden email]> wrote:

Check the yarn AM log for details. 



Aakash Basu <[hidden email]>于2018年6月8日周五 下午4:36写道:
Hi,

Getting this error when trying to run Spark Shell using YARN -

Command: spark-shell --master yarn --deploy-mode client

2018-06-08 13:39:09 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-06-08 13:39:25 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

The last half of stack-trace -

2018-06-08 13:56:11 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2018-06-08 13:56:11 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

Tried putting the spark-yarn_2.11-2.3.0.jar in Hadoop yarn, still not working, anything else to do?

Thanks,
Aakash.