Write Partitioned Parquet Using UDF On Partition Column
This post was updated on .
In version 1.6.0, is it possible to write a partitioned dataframe into
parquet format using a UDF function on the partition column? I'm using
Let's say I have a dataframe with coumn `date`, of type string or int, which
contains values such as `20170825`. Is it possible to define a UDF called
`by_month` or `by_year`, which could then be used to write the table as
parquet, ideally in this way:
I haven't even tried this so I don't know if it's possible. If so, what are
the ways by which this can be done? Ideally, without having to resort to add
an additional column like `part_id` to the dataframe with the result of
`by_month(date)` and partitioning by that column instead.