DirectFileOutputCommitter in Spark 2.3.1

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

DirectFileOutputCommitter in Spark 2.3.1

Chitturi Padma
Hello Team,
 
I am trying to write a DataSet as parquet file in Append mode partitioned by few columns. However since the job is time consuming, I would like to enable DirectFileOutputCommitter (i.e by-passing the writes to temporary folder).

Version of the spark i am using is 2.3.1.

Can someone please help in enabling the configuration which allows direct write to S3 both in case of appending, writing new files and overwriting the files.

Thanks, 
Padma CH
Reply | Threaded
Open this post in threaded view
|

Re: DirectFileOutputCommitter in Spark 2.3.1

Dillon Dukek
I believe you need to set mapreduce.fileoutputcommitter.algorithm.version to 2.

On Wed, Sep 19, 2018 at 10:45 AM Priya Ch <[hidden email]> wrote:
Hello Team,
 
I am trying to write a DataSet as parquet file in Append mode partitioned by few columns. However since the job is time consuming, I would like to enable DirectFileOutputCommitter (i.e by-passing the writes to temporary folder).

Version of the spark i am using is 2.3.1.

Can someone please help in enabling the configuration which allows direct write to S3 both in case of appending, writing new files and overwriting the files.

Thanks, 
Padma CH