Read parquet files as buckets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Read parquet files as buckets

אורן שמון
Hi all,
I have Parquet files as result from some job , the job saved them in bucket mode by userId . How can I read the files in bucket mode in another job ? I tried to read it but it didnt bucket the data (same user in same partition)
Reply | Threaded
Open this post in threaded view
|

Re: Read parquet files as buckets

MidwestMike
Hi,
   What about the DAG can you send that as well?  From the resulting "write" call?

On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון <[hidden email]> wrote:
The version is 2.2.0 . 
The code for the write is :
sortedApiRequestLogsDataSet.write
      .bucketBy(numberOfBuckets, "userId")
      .mode(SaveMode.Overwrite)
      .format("parquet")
      .option("path", outputPath + "/")
      .option("compression", "snappy")
      .saveAsTable("sorted_api_logs"

And code for the read :
val df = sparkSession.read.parquet(path).toDF()

The read code run on other cluster than the write .




On Tue, Oct 31, 2017 at 7:02 PM Michael Artz <[hidden email]> wrote:
What version of spark?  Do you have code sample?  Screen shot of the DAG or the printout from .explain?

On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון <[hidden email]> wrote:
Hi all,
I have Parquet files as result from some job , the job saved them in bucket mode by userId . How can I read the files in bucket mode in another job ? I tried to read it but it didnt bucket the data (same user in same partition)