interpretation of Spark logs about remote fetch latency

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

interpretation of Spark logs about remote fetch latency

ricky l
Hello Spark users,

I have an inquiry while analyzing a sample Spark task. The task has remote fetches (shuffle) from few blocks. However, the remote fetch time does not really make sense to me. Can someone please help to interpret this?

The logs came from Spark REST API. The task ID 33 needs four blocks, and it has to fetch three blocks from remote machines. In the "shuffleReadMetrics" section, however, it marks as the "fetchWaitTime" as 0 while it really fetches about 2.4GB from remote machines.
While in task ID 34 below, it needs to fetch 4 blocks with total size of around 3GB, it shows the fetchWaitTime is about 2.4 seconds, and only this makes sense.

Is this an intended behavior?

    "33" : {
      "taskId" : 33,
       ....
      "taskMetrics" : {
        ....
        "shuffleReadMetrics" : {
          "remoteBlocksFetched" : 3,
          "localBlocksFetched" : 1,
          "fetchWaitTime" : 0,
          "remoteBytesRead" : 2401539138,
          "localBytesRead" : 800513041,
          "recordsRead" : 4
        },
      }
    },
    "34" : {
      "taskId" : 34,
      ....
      "taskMetrics" : {
        ....
        "shuffleReadMetrics" : {
          "remoteBlocksFetched" : 4,
          "localBlocksFetched" : 0,
          "fetchWaitTime" : 2416,
          "remoteBytesRead" : 3202052194,
          "localBytesRead" : 0,
          "recordsRead" : 4
        },
      }
    },