A question about radd bytes size

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

A question about radd bytes size

zhangliyun
Hi:

 I want to get the total bytes of a DataFrame by following function , but when I insert the DataFrame into hive , I found the value of the function is different from spark.sql.statistics.totalSize .  The spark.sql.statistics.totalSize  is less than the result of following function getRDDBytes . 

   def getRDDBytes(df:DataFrame):Long={

df.rdd.getNumPartitions match {
case 0 =>
0
case numPartitions =>
val rddOfDataframe = df.rdd.map(_.toString().getBytes("UTF-8").length.toLong)
val size = if (rddOfDataframe.isEmpty()) {
0
} else {
rddOfDataframe.reduce(_ + _)
}

size
}
}
Appreciate if you can provide your suggestion.

Best Regards
Kelly Zhang