Information on Spark UI

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Information on Spark UI

coderxiang
This post was updated on .
Hi,
  Came up with some confusion regarding the information on SparkUI. The
following info is gathered while factorizing a large matrix using ALS:
  1. some stages have more succeeded tasks than total tasks, which are
displayed in the 5th column.
  2. duplicate stages with exactly same stageID (stage 1/3/7)
  3. Clicking into some stages, some executors cannot be addressed. Does
that mean lost of executor or this does not matter?

I'm using a Yarn cluster.

  Any explanation are appreciated!


Screen Shot 2014-06-10 at 5.19.04 PM.png (206K) <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/7354/0/Screen%20Shot%202014-06-10%20at%205.19.04%20PM.png>
Screen Shot 2014-06-10 at 5.20.00 PM.png (200K) <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/7354/1/Screen%20Shot%202014-06-10%20at%205.20.00%20PM.png>
Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

coderxiang
The executors shown "CANNOT FIND ADDRESS" are not listed in the Executors Tab on the top of the Spark UI.
Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

neville.lyh
We are seeing this issue as well.
We run on YARN and see logs about lost executor. Looks like some stages had to be re-run to compute RDD partitions lost in the executor.

We were able to complete 20 iterations with 20% full matrix but not beyond that (total > 100GB).


On Tue, Jun 10, 2014 at 8:32 PM, coderxiang <[hidden email]> wrote:
The executors shown "CANNOT FIND ADDRESS" are not listed in the Executors Tab
on the top of the Spark UI.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Information-on-Spark-UI-tp7354p7355.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

martinxu
This post has NOT been accepted by the mailing list yet.
what is the solution?
Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

Daniel Darabos
In reply to this post by coderxiang
About more succeeded tasks than total tasks:
 - This can happen if you have enabled speculative execution. Some partitions can get processed multiple times.
 - More commonly, the result of the stage may be used in a later calculation, and has to be recalculated. This happens if some of the results were evicted from cache.


On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang <[hidden email]> wrote:
Hi,
  Came up with some confusion regarding the information on SparkUI. The following info is gathered while factorizing a large matrix using ALS:
  1. some stages have more succeeded tasks than total tasks, which are displayed in the 5th column.
  2. duplicate stages with exactly same stageID (stage 1/3/7)
  3. Clicking into some stages, some executors cannot be addressed. Does that mean lost of executor or this does not matter?

  Any explanation are appreciated!



Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

coderxiang
Daniel, 
  Thanks for the explanation.


On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos <[hidden email]> wrote:
About more succeeded tasks than total tasks:
 - This can happen if you have enabled speculative execution. Some partitions can get processed multiple times.
 - More commonly, the result of the stage may be used in a later calculation, and has to be recalculated. This happens if some of the results were evicted from cache.


On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang <[hidden email]> wrote:
Hi,
  Came up with some confusion regarding the information on SparkUI. The following info is gathered while factorizing a large matrix using ALS:
  1. some stages have more succeeded tasks than total tasks, which are displayed in the 5th column.
  2. duplicate stages with exactly same stageID (stage 1/3/7)
  3. Clicking into some stages, some executors cannot be addressed. Does that mean lost of executor or this does not matter?

  Any explanation are appreciated!




Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

neville.lyh
Does cache eviction affect disk storage level too? I tried cranking up replication but still seeing this.

On Wednesday, June 11, 2014, Shuo Xiang <[hidden email]> wrote:
Daniel, 
  Thanks for the explanation.


On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;daniel.darabos@lynxanalytics.com&#39;);" target="_blank">daniel.darabos@...> wrote:
About more succeeded tasks than total tasks:
 - This can happen if you have enabled speculative execution. Some partitions can get processed multiple times.
 - More commonly, the result of the stage may be used in a later calculation, and has to be recalculated. This happens if some of the results were evicted from cache.


On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;shuoxiangpub@gmail.com&#39;);" target="_blank">shuoxiangpub@...> wrote:
Hi,
  Came up with some confusion regarding the information on SparkUI. The following info is gathered while factorizing a large matrix using ALS:
  1. some stages have more succeeded tasks than total tasks, which are displayed in the 5th column.
  2. duplicate stages with exactly same stageID (stage 1/3/7)
  3. Clicking into some stages, some executors cannot be addressed. Does that mean lost of executor or this does not matter?

  Any explanation are appreciated!




Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

coderxiang
Using MEMORY_AND_DISK_SER to persist the input RDD[Rating] seems to work right for me now. I'm testing on a larger dataset and will see how it goes.


On Wed, Jun 11, 2014 at 9:56 AM, Neville Li <[hidden email]> wrote:
Does cache eviction affect disk storage level too? I tried cranking up replication but still seeing this.


On Wednesday, June 11, 2014, Shuo Xiang <[hidden email]> wrote:
Daniel, 
  Thanks for the explanation.


On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos <[hidden email]> wrote:
About more succeeded tasks than total tasks:
 - This can happen if you have enabled speculative execution. Some partitions can get processed multiple times.
 - More commonly, the result of the stage may be used in a later calculation, and has to be recalculated. This happens if some of the results were evicted from cache.


On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang <[hidden email]> wrote:
Hi,
  Came up with some confusion regarding the information on SparkUI. The following info is gathered while factorizing a large matrix using ALS:
  1. some stages have more succeeded tasks than total tasks, which are displayed in the 5th column.
  2. duplicate stages with exactly same stageID (stage 1/3/7)
  3. Clicking into some stages, some executors cannot be addressed. Does that mean lost of executor or this does not matter?

  Any explanation are appreciated!





Reply | Threaded
Open this post in threaded view
|

Re: Information on Spark UI

martinxu
This post has NOT been accepted by the mailing list yet.
almost 1/3 tasks, CANNOT FIND ADDRESS.

I run shark 0.9.1 spark 0.9.1 on yarn. cdh 5.0.1