Number Of Partitions in RDD

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Number Of Partitions in RDD

Vikash Pareek
This post has NOT been accepted by the mailing list yet.
Hi,

I am creating a RDD from a text file by specifying number of partitions. But it gives me different number of partitions than the specified one.

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 0)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[72] at textFile at <console>:27

scala> people.getNumPartitions
res47: Int = 1

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 1)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[50] at textFile at <console>:27

scala> people.getNumPartitions
res36: Int = 1

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 2)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[52] at textFile at <console>:27

scala> people.getNumPartitions
res37: Int = 2

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 3)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[54] at textFile at <console>:27

scala> people.getNumPartitions
res38: Int = 3

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 4)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[56] at textFile at <console>:27

scala> people.getNumPartitions
res39: Int = 4

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 5)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[58] at textFile at <console>:27

scala> people.getNumPartitions
res40: Int = 6

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 6)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[60] at textFile at <console>:27

scala> people.getNumPartitions
res41: Int = 7

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 7)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[62] at textFile at <console>:27

scala> people.getNumPartitions
res42: Int = 8

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 8)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[64] at textFile at <console>:27

scala> people.getNumPartitions
res43: Int = 9

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 9)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[66] at textFile at <console>:27

scala> people.getNumPartitions
res44: Int = 11

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 10)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[68] at textFile at <console>:27

scala> people.getNumPartitions
res45: Int = 11

scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 11)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[70] at textFile at <console>:27

scala> people.getNumPartitions
res46: Int = 13


Contents of the file /home/pvikash/data/test.txt is:
"
This is a test file.
Will be used for rdd partition
"

I am trying to understand why number of partitions is changing here and in case we have small data (which can fit into one partition) then why spark creates empty partitions?

Any explanation would be appreciated.

--Vikash

__Vikash Pareek
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Number Of Partitions in RDD

neil90
This post has NOT been accepted by the mailing list yet.
What version of spark of spark are you using?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Number Of Partitions in RDD

Vikash Pareek
This post has NOT been accepted by the mailing list yet.
Spark 1.6.1

__Vikash Pareek
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Number Of Partitions in RDD

neil90
This post has NOT been accepted by the mailing list yet.
CLuster mode with HDFS? or local mode?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Number Of Partitions in RDD

Vikash Pareek
This post has NOT been accepted by the mailing list yet.
Local mode

__Vikash Pareek
Loading...