Quantcast

spark streaming exectors memory increasing and executor killed by yarn

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

spark streaming exectors memory increasing and executor killed by yarn

darin
This post has NOT been accepted by the mailing list yet.
Hi,
I got this exception when streaming program run some hours.

```
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 21 in stage 1194.0 failed 4 times, most recent failure: Lost task 21.3 in stage 1194.0 (TID 2475, 2.dev3, executor 66): ExecutorLostFailure (executor 66 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 3.5 GB of 3.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
```

I have googled some solutions like close yarn memory monitor ,increasing exector memory... .I think it is not the right way .


And this is the submit script:
```
spark-submit --master yarn-cluster --driver-cores 1 --driver-memory 1G --num-executors 6 --executor-cores 3 --executor-memory 3G --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/javadump.hprof" --conf "spark.kryoserializer.buffer.max=512m" --class com.dtise.data.streaming.ad.DTStreamingStatistics hdfs://nameservice1/user/yanghb/spark-streaming-1.0.jar
```

And This is the main codes:

```
val originalStream = ssc.textFileStream(rawDataPath)
    originalStream.repartition(10).mapPartitions(parseAdLog).reduceByKey(_ ++ _)
      .mapWithState(StateSpec.function(countAdLogWithState _)).foreachRDD(rdd => {
        if (!rdd.isEmpty()) {
          val batchTime = Calendar.getInstance.getTimeInMillis
          val dimensionSumMap = rdd.map(_._1).reduce(_ ++ _)
          val nameList = rdd.map(_._2).reduce(_ ++ _).toList
          val jedis = RedisUtils.jedis()
          jedis.hmset(joinString("t_ad_dimension_sum", batchTime), dimensionSumMap)
          jedis.lpush(joinString("t_ad_name", batchTime), nameList: _*)
          jedis.set(joinString("t_ad", batchTime.toString), "OK")
          jedis.close()

          rdd.flatMap(_._3).foreachPartition(logInfoList => {
            val producter = new StringProducter
            for (logInfo <- logInfoList) {
              val logInfoArr = logInfo.split("\t", -1)
              val kafkaKey = "ad/" + logInfoArr(campaignIdIdx) + "/" + logInfoArr(logDateIdx)
              producter.send("cookedLog", kafkaKey, logInfo)
            }
            producter.close()
          })
        }
      })
```

These are jvm heap mat results

histogram
dominator
leak suspects

Anybody has any advice about this ?
Thanks

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: spark streaming exectors memory increasing and executor killed by yarn

blyncsy.david.lewis
This post has NOT been accepted by the mailing list yet.
I am having a similar issue... Mine manifests over the course of 24 hours, eventually my long running job slows to a halt because the memory store has filled the heap with it's entries. I have disabled all instances of rdd caching and made sure that every broadcast is subsequently unpersisted and destroyed.

If you find any solution please post it here! I will do the same.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: spark streaming exectors memory increasing and executor killed by yarn

darin
This post has NOT been accepted by the mailing list yet.
I add this code in foreachRDD block .
```
rdd.persist(StorageLevel.MEMORY_AND_DISK)
```


This exception no occur agein.But many executor dead showing in spark streaming UI .
```
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 21 in stage 1194.0 failed 4 times, most recent failure: Lost task 21.3 in stage 1194.0 (TID 2475, 2.dev3, executor 66): ExecutorLostFailure (executor 66 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 3.5 GB of 3.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
```
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: spark streaming exectors memory increasing and executor killed by yarn

darin
This post has NOT been accepted by the mailing list yet.
In reply to this post by blyncsy.david.lewis
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: spark streaming exectors memory increasing and executor killed by yarn

blyncsy.david.lewis
This post has NOT been accepted by the mailing list yet.
Thanks for the info. I'm not using SparkStreaming, so I don't think it's a map with state problem. Also my job usually succumbs to time outs before OOM.
Loading...