Write Partitioned Parquet Using UDF On Partition Column

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Write Partitioned Parquet Using UDF On Partition Column

Richard Primera
This post was updated on .

In version 1.6.0, is it possible to write a partitioned dataframe into
parquet format using a UDF function on the partition column? I'm using

Let's say I have a dataframe with coumn `date`, of type string or int, which
contains values such as `20170825`. Is it possible to define a UDF called
`by_month` or `by_year`, which could then be used to write the table as
parquet, ideally in this way:


I haven't even tried this so I don't know if it's possible. If so, what are
the ways by which this can be done? Ideally, without having to resort to add
an additional column like `part_id` to the dataframe with the result of
`by_month(date)` and partitioning by that column instead.

Thanks in advance.

Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

To unsubscribe e-mail: user-unsubscribe@spark.apache.org