How Spark internally works in this scenario?

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

How Spark internally works in this scenario?

This post has NOT been accepted by the mailing list yet.
      JavaRDD<String> textFile = sc.textFile("C://test.txt");

Say I have 1000 line test.txt file. I have single machile with quadcore processor.Here is mine understanding how spark will achieve parallelism here

1. Spark will read chunk of chracters from file in single thread. Not sure is there a default chunk size or it depends on file size
2. Spark will decide how many partition it has to make based on below two params
                a) Data size it has read in step 1 and
                b) Based on number of cores in cpu
3. Based on partition size in step 2, it will spawn the thread. If there 3 partition , it will spawn three threads.

Is mine understanding correct ?

Posted on too but did not get any answers as of now