Question on writing a dataframe without medatadata column names

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Question on writing a dataframe without medatadata column names

Parsian, Mahmoud

Let’s say that I have a spark dataframe as 3 columns: 

id, name, age.

 

When I save it into HDFS/S3, it saves as:

(where I have used “partitionBy(id, name)”)

 

<root-dir>/id=1/name=Alex/<filename-1>.parquet

<root-dir>/id=2/name=Bob/<filename-2>.parquet

 

If I want not to include “id=” and “name=” in

directory structures, what should I do

 

Therefore I want my final output to be:

 

<root-dir>/1/Alex/<filename-1>.parquet

<root-dir>/2/Bob/<filename-2>.parquet

 

Thanks,

M. Parsian