Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

silvermast
It appears that between 2.2.0 and 2.3.0 DataFrame.write.parquet() skips writing empty parquet files for empty partitions. Is this configurable? Is there a Jira that tracks this change?

Thanks,
Victor
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

silvermast
Found it: SPARK-21435

On Mon, May 7, 2018 at 2:18 PM Victor Tso-Guillen <[hidden email]> wrote:
It appears that between 2.2.0 and 2.3.0 DataFrame.write.parquet() skips writing empty parquet files for empty partitions. Is this configurable? Is there a Jira that tracks this change?

Thanks,
Victor
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

liyuanjian
Yea

What’s the scenario you want the empty partitions configurable? Do you still need empty files?

在 2018年5月8日,03:35,Victor Tso-Guillen <[hidden email]> 写道:

Found it: SPARK-21435

On Mon, May 7, 2018 at 2:18 PM Victor Tso-Guillen <[hidden email]> wrote:
It appears that between 2.2.0 and 2.3.0 DataFrame.write.parquet() skips writing empty parquet files for empty partitions. Is this configurable? Is there a Jira that tracks this change?

Thanks,
Victor