Explain About Logs NetworkWordcount.scala

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Explain About Logs NetworkWordcount.scala

eduardocalfaia
Hi Guys,

Could anyone help me understanding the logs below? Why the result in the second log is 0?

Thanks Guys

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919557000 ms.0 from job set of time 1392919557000 ms
14/02/20 19:06:00 INFO JobScheduler: Total delay: 3.185 s for time 1392919557000 ms (execution: 3.167 s)
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919557000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 812 (combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO DAGScheduler: Got job 91 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 181 (first at NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42), which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 2 missing tasks from Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 182.0 with 2 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:1 as TID 609 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:1 as 3023 bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:0 as TID 610 on executor 0: computer1.ant-net (NODE_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:0 as 3485 bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 609 in 17 ms on computer1.ant-net (progress: 0/2)
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 1)
14/02/20 19:06:00 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1392919527400 in memory on computer1.ant-net:41142 (size: 2018.6 KB, free: 387.3 MB)
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 610 in 67 ms on computer1.ant-net (progress: 1/2)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 182.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 182 (combineByKey at ShuffledDStream.scala:42) finished in 0.080 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 181)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO CheckpointWriter: Deleting hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919554000.bk
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 181: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO CheckpointWriter: Checkpoint for time 1392919557000 ms saved to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000', took 3270 bytes and 102 ms
14/02/20 19:06:00 INFO DStreamGraph: Clearing checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Cleared checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 181.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 181.0:0 as TID 611 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 181.0:0 as 2057 bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 90 to [hidden email]
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 90 is 146 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 611 in 25 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 181.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(181, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 181 (first at NetworkWordCount.scala:87) finished in 0.027 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.133625862 s
118967 (Total of words in a RDD)
#######################################################################################

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919558000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919558000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919558000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 821 (combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO JobScheduler: Total delay: 2.322 s for time 1392919558000 ms (execution: 0.134 s)
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919559000 ms.0 from job set of time 1392919559000 ms
14/02/20 19:06:00 INFO DAGScheduler: Got job 92 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 183 (first at NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 184 (MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42), which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 184 (MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 184.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 184.0:0 as TID 612 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 184.0:0 as 3024 bytes in 1 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 612 in 17 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 184.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(184, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 184 (combineByKey at ShuffledDStream.scala:42) finished in 0.018 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 183)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 183: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 183.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 183.0:0 as TID 613 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 183.0:0 as 2057 bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 91 to [hidden email]
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 91 is 137 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 613 in 23 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(183, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 183 (first at NetworkWordCount.scala:87) finished in 0.026 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.072442522 s
0 (Total of words in a RDD)





Informativa sulla Privacy: http://www.unibs.it/node/8155
Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

Mayur Rustagi
is fresh data being put into the source folder...

Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Thu, Feb 20, 2014 at 10:46 AM, Eduardo Costa Alfaia <[hidden email]> wrote:
Hi Guys,

Could anyone help me understanding the logs below? Why the result in the second log is 0?

Thanks Guys

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919557000 ms.0 from job set of time 1392919557000 ms
14/02/20 19:06:00 INFO JobScheduler: Total delay: 3.185 s for time 1392919557000 ms (execution: 3.167 s)
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919557000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 812 (combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO DAGScheduler: Got job 91 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 181 (first at NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42), which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 2 missing tasks from Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 182.0 with 2 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:1 as TID 609 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:1 as 3023 bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:0 as TID 610 on executor 0: computer1.ant-net (NODE_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:0 as 3485 bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 609 in 17 ms on computer1.ant-net (progress: 0/2)
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 1)
14/02/20 19:06:00 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1392919527400 in memory on computer1.ant-net:41142 (size: 2018.6 KB, free: 387.3 MB)
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 610 in 67 ms on computer1.ant-net (progress: 1/2)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 182.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 182 (combineByKey at ShuffledDStream.scala:42) finished in 0.080 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 181)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO CheckpointWriter: Deleting hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919554000.bk
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 181: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO CheckpointWriter: Checkpoint for time 1392919557000 ms saved to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000', took 3270 bytes and 102 ms
14/02/20 19:06:00 INFO DStreamGraph: Clearing checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Cleared checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 181.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 181.0:0 as TID 611 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 181.0:0 as 2057 bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 90 to [hidden email]
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 90 is 146 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 611 in 25 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 181.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(181, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 181 (first at NetworkWordCount.scala:87) finished in 0.027 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.133625862 s
118967 (Total of words in a RDD)
#######################################################################################

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919558000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919558000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919558000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 821 (combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO JobScheduler: Total delay: 2.322 s for time 1392919558000 ms (execution: 0.134 s)
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919559000 ms.0 from job set of time 1392919559000 ms
14/02/20 19:06:00 INFO DAGScheduler: Got job 92 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 183 (first at NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 184 (MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42), which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 184 (MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 184.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 184.0:0 as TID 612 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 184.0:0 as 3024 bytes in 1 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 612 in 17 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 184.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(184, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 184 (combineByKey at ShuffledDStream.scala:42) finished in 0.018 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 183)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 183: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 183.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 183.0:0 as TID 613 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 183.0:0 as 2057 bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 91 to [hidden email]
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 91 is 137 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 613 in 23 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(183, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 183 (first at NetworkWordCount.scala:87) finished in 0.026 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.072442522 s
0 (Total of words in a RDD)





Informativa sulla Privacy: http://www.unibs.it/node/8155

Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

Tathagata Das
It could be that the worker receiving the data was undergoing GC and so could actually "receive" any data. Can you check the web ui for the application to see GC times of the corresponding stages?

TD




On Thu, Feb 20, 2014 at 12:03 PM, Mayur Rustagi <[hidden email]> wrote:
is fresh data being put into the source folder...

Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Thu, Feb 20, 2014 at 10:46 AM, Eduardo Costa Alfaia <[hidden email]> wrote:
Hi Guys,

Could anyone help me understanding the logs below? Why the result in the second log is 0?

Thanks Guys

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919557000 ms.0 from job set of time 1392919557000 ms
14/02/20 19:06:00 INFO JobScheduler: Total delay: 3.185 s for time 1392919557000 ms (execution: 3.167 s)
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919557000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 812 (combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO DAGScheduler: Got job 91 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 181 (first at NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42), which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 2 missing tasks from Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 182.0 with 2 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:1 as TID 609 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:1 as 3023 bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:0 as TID 610 on executor 0: computer1.ant-net (NODE_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:0 as 3485 bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 609 in 17 ms on computer1.ant-net (progress: 0/2)
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 1)
14/02/20 19:06:00 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1392919527400 in memory on computer1.ant-net:41142 (size: 2018.6 KB, free: 387.3 MB)
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 610 in 67 ms on computer1.ant-net (progress: 1/2)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 182.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 182 (combineByKey at ShuffledDStream.scala:42) finished in 0.080 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 181)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO CheckpointWriter: Deleting hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919554000.bk
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 181: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO CheckpointWriter: Checkpoint for time 1392919557000 ms saved to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000', took 3270 bytes and 102 ms
14/02/20 19:06:00 INFO DStreamGraph: Clearing checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Cleared checkpoint data for time 1392919557000 ms
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 181.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 181.0:0 as TID 611 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 181.0:0 as 2057 bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 90 to [hidden email]
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 90 is 146 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 611 in 25 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 181.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(181, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 181 (first at NetworkWordCount.scala:87) finished in 0.027 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.133625862 s
118967 (Total of words in a RDD)
#######################################################################################

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919558000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919558000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919558000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 821 (combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO JobScheduler: Total delay: 2.322 s for time 1392919558000 ms (execution: 0.134 s)
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919559000 ms.0 from job set of time 1392919559000 ms
14/02/20 19:06:00 INFO DAGScheduler: Got job 92 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 183 (first at NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 184 (MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42), which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 184 (MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 184.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 184.0:0 as TID 612 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 184.0:0 as 3024 bytes in 1 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 612 in 17 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 184.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(184, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 184 (combineByKey at ShuffledDStream.scala:42) finished in 0.018 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 183)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 183: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 183.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 183.0:0 as TID 613 on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 183.0:0 as 2057 bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 91 to [hidden email]
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 91 is 137 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 613 in 23 ms on computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(183, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 183 (first at NetworkWordCount.scala:87) finished in 0.026 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.072442522 s
0 (Total of words in a RDD)





Informativa sulla Privacy: http://www.unibs.it/node/8155


Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

eduardocalfaia
Hi TD,
I have seen in the web UI the stage number that result has been zero and in the field GC Times there is nothing.
 
Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

Tathagata Das
I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side that data is being accepted by the receiver but not reported?

TD


On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia <[hidden email]> wrote:
Hi TD,
I have seen in the web UI the stage number that result has been zero and in
the field GC Times there is nothing.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png>



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

eduardocalfaia
Yes TD,
I can use tcpdump to see if the data are being accepted by the receiver and if else them are arriving into the IP packet.

Thanks
Em 3/8/14, 4:19, Tathagata Das escreveu:
I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side that data is being accepted by the receiver but not reported?

TD


On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia <[hidden email]> wrote:
Hi TD,
I have seen in the web UI the stage number that result has been zero and in
the field GC Times there is nothing.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png>



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.



Informativa sulla Privacy: http://www.unibs.it/node/8155
Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

eduardocalfaia
In reply to this post by Tathagata Das
Hi TD,
I have attached a source code from the application that I use to send the words to workers.

BR
Em 3/8/14, 4:21, Tathagata Das [via Apache Spark User List] escreveu:
I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side that data is being accepted by the receiver but not reported?

TD


On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia <[hidden email]> wrote:
Hi TD,
I have seen in the web UI the stage number that result has been zero and in
the field GC Times there is nothing.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png>



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.




If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2421.html
To unsubscribe from Explain About Logs NetworkWordcount.scala, click here.
NAML


Informativa sulla Privacy: http://www.unibs.it/node/8155

cachesender.c (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Explain About Logs NetworkWordcount.scala

eduardocalfaia
In reply to this post by Tathagata Das
Hi TD,
Today I have seen these differences between the logs:

Result different from zero:
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394445287000 on computer1.ant-net:51441 in memory (size: 10.1 MB, free: 1447.3 MB)
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1394445315400 in memory on computer1.ant-net:51441 (size: 4.8 MB, free: 1442.5 MB)
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1394445315600 in memory on computer1.ant-net:51441 (size: 3.8 MB, free: 1438.8 MB)
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394445287200 on computer1.ant-net:51441 in memory (size: 4.7 MB, free: 1448.2 MB)
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1394445315800 in memory on computer1.ant-net:51441 (size: 9.8 MB, free: 1438.4 MB)
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394445287400 on computer1.ant-net:51441 in memory (size: 4.7 MB, free: 1447.7 MB)
14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1394445316000 in memory on computer1.ant-net:51441 (size: 2.5 MB, free: 1445.2 MB)

When the result is equal zero the information above not exist.

Do you have any idea about this?

BR


Em 3/8/14, 4:21, Tathagata Das [via Apache Spark User List] escreveu:
I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side that data is being accepted by the receiver but not reported?

TD


On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia <[hidden email]> wrote:
Hi TD,
I have seen in the web UI the stage number that result has been zero and in
the field GC Times there is nothing.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png>



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.




If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2421.html
To unsubscribe from Explain About Logs NetworkWordcount.scala, click here.
NAML


Informativa sulla Privacy: http://www.unibs.it/node/8155