Recently, we are upgrading spark from 2.4 to 3.0. We are doing performance testing and found some performance problems.Through the comparative test, it is found that spark3.0 reads kudu data much slower than 2.4. Normally,
spark2.4 takes 0.1-1s to read the same amount of data, but spark3.0
takes 1 minute to 2 minutes.Both versions of spark use the same spark submit parameter and run in local mode. The read kudu clusters, tables and query conditions are consistent.
The only difference is that the kudu spark package is different, and that for spark2.4 is kudu-spark2_2.11,scala
version is 2.11, spark3.0 uses kudu-spark3_2.12 ,scala version is
2.12(This package is based on the Java version compiled by kudu 1.13，use
spark 3.0.0 and scala 2.12 pom.xml file )
Our cluster uses CDH 6.3.1 and kudu version is 1.10.In view of this situation, what can be optimized or suggestions to improve the performance of kudu reading data?