Dynamic partitioning weird behavior

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Dynamic partitioning weird behavior

Nikolay Skovpin
Hi guys.
I was investigating a spark property
/spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")/. It
works perfectly in local fs, but on s3 i stumbled into a strange behavior.
If i don't have a hive table or this table is empty, spark won't save any
data into this table with SaveMode.Overwrite.
What i did:
import org.apache.spark.sql.{SaveMode, SparkSession}

  val spark = SparkSession.builder()
  .appName("Test for dynamic partitioning")
  .config("spark.sql.sources.partitionOverwriteMode", "dynamic")
  .getOrCreate()
 
 val users = Seq(
     ("11", "Nikolay", "1900", "1"),
     ("12", "Nikolay", "1900", "1"),
     ("13", "Sergey", "1901", "1"),
     ("14", "Jone", "1900", "2"))
     .toDF("user_id", "name","year", "month")

users.write.partitionBy("year",
"month").mode(SaveMode.Overwrite).option("path",
"s3://dynamicPartitioning/users").saveAsTable("test.users")

I can see from logs that spark populates .spark-staging directory with the
data, then spark executes rename command.
But AlterTableRecoverPartitionsCommand shows me a message: /Found 0
partitions, Finished to gather the fast stats for all 0 partitions/. After
that the directory on s3 is empty (except _Sussess flag).
It is ok or a bug?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]