Spark 1.6 change the number partitions without repartition and without shuffling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark 1.6 change the number partitions without repartition and without shuffling

spicoflorin
Hello!

 I have a parquet file that has 60MB representing 10millions records.
When I read this file using Spark 2.3.0 and with the configuration spark.sql.files.maxPartitionBytes=1024*1024*2 (=2MB) I got 29 partitions  as expected.
Code:
 sqlContext.setConf("spark.sql.files.maxPartitionBytes", Long.toString(2097152));
DataFrame inputDataDf = sqlContext.read().parquet("10Mrecords.parquet");


But when I read the same file with the Spark 1.6.0, the above configuration will not take effect and I get a single partition. Thus one task that will do all the processing and no parallelism.

Also, I have use the following configurations without any effect:

Write the parquet file with diffrent size in order to increase the number of group blocks
 sparkContext.hadoopConfiguration.setLong("parquet.block.size", 1024*50)

 sparkContext.hadoopConfiguration.setLong("mapred.max.split.size", 1024*50)


My question is:
How to achieve the same behavior (to get the desired number of partitions) when using Spark 1.6 (without repartition method and without any method that incurs shuffling)?

I look forward for your answers.
 Regards,
  Florin