Say I have 1000 line test.txt file. I have single machile with quadcore processor.Here is mine understanding how spark will achieve parallelism here
1. Spark will read chunk of chracters from file in single thread. Not sure is there a default chunk size or it depends on file size
2. Spark will decide how many partition it has to make based on below two params
a) Data size it has read in step 1 and
b) Based on number of cores in cpu
3. Based on partition size in step 2, it will spawn the thread. If there 3 partition , it will spawn three threads.