How to avoid duplicate column names after join with multiple conditions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to avoid duplicate column names after join with multiple conditions

Nirav Patel
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        
Reply | Threaded
Open this post in threaded view
|

Re: How to avoid duplicate column names after join with multiple conditions

Gokul
Nirav, 

withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. 



Thanks & Regards, 
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel <[hidden email]> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        

Reply | Threaded
Open this post in threaded view
|

Re: How to avoid duplicate column names after join with multiple conditions

Vamshi Talla
Nirav,

Spark does not create a duplicate column when you use the below join expression,  as an array of column(s) like below but that requires the column name to be same in both the data frames.

Example: df1.join(df2, [‘a’])

Thanks.
Vamshi Talla

On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D <[hidden email]> wrote:

Nirav, 

withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. 



Thanks & Regards, 
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel <[hidden email]> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        


Reply | Threaded
Open this post in threaded view
|

Re: How to avoid duplicate column names after join with multiple conditions

Nirav Patel
Hi Vamshi,

That api is very restricted and not generic enough. It imposes that all conditions of joins has to have same column on both side and it also has to be equijoin. It doesn't serve my usecase where some join predicates don't have same column names.

Thanks

On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla <[hidden email]> wrote:
Nirav,

Spark does not create a duplicate column when you use the below join expression,  as an array of column(s) like below but that requires the column name to be same in both the data frames.

Example: df1.join(df2, [‘a’])

Thanks.
Vamshi Talla

On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D <[hidden email]> wrote:

Nirav, 

withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. 



Thanks & Regards, 
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel <[hidden email]> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        






What's New with Xactly

        
Reply | Threaded
Open this post in threaded view
|

Re: How to avoid duplicate column names after join with multiple conditions

Prem Sure
Hi Nirav, did you try 
.drop(df1(a) after join

Thanks,
Prem

On Thu, Jul 12, 2018 at 9:50 PM Nirav Patel <[hidden email]> wrote:
Hi Vamshi,

That api is very restricted and not generic enough. It imposes that all conditions of joins has to have same column on both side and it also has to be equijoin. It doesn't serve my usecase where some join predicates don't have same column names.

Thanks

On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla <[hidden email]> wrote:
Nirav,

Spark does not create a duplicate column when you use the below join expression,  as an array of column(s) like below but that requires the column name to be same in both the data frames.

Example: df1.join(df2, [‘a’])

Thanks.
Vamshi Talla

On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D <[hidden email]> wrote:

Nirav, 

withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. 



Thanks & Regards, 
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel <[hidden email]> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        






What's New with Xactly

        
Reply | Threaded
Open this post in threaded view
|

Re: How to avoid duplicate column names after join with multiple conditions

Nirav Patel
Hi Prem, dropping column, renaming column are working for me as a workaround. I thought it just nice to have generic api that can handle that for me. or some intelligence that since both columns are same it shouldn't complain in subsequent Select clause that it doesn't know if I mean a#12 or a#81. They are both same just pick one.

On Thu, Jul 12, 2018 at 9:38 AM, Prem Sure <[hidden email]> wrote:
Hi Nirav, did you try 
.drop(df1(a) after join

Thanks,
Prem

On Thu, Jul 12, 2018 at 9:50 PM Nirav Patel <[hidden email]> wrote:
Hi Vamshi,

That api is very restricted and not generic enough. It imposes that all conditions of joins has to have same column on both side and it also has to be equijoin. It doesn't serve my usecase where some join predicates don't have same column names.

Thanks

On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla <[hidden email]> wrote:
Nirav,

Spark does not create a duplicate column when you use the below join expression,  as an array of column(s) like below but that requires the column name to be same in both the data frames.

Example: df1.join(df2, [‘a’])

Thanks.
Vamshi Talla

On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D <[hidden email]> wrote:

Nirav, 

withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. 



Thanks & Regards, 
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel <[hidden email]> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        






What's New with Xactly

        




What's New with Xactly

        
Reply | Threaded
Open this post in threaded view
|

Re: How to avoid duplicate column names after join with multiple conditions

Prem Sure
Yes Nirav, we can probably request dev for a config param enablement to take care of this automatically (internally) - additional care required while specifying column names and joining from users

Thanks,
Prem

On Thu, Jul 12, 2018 at 10:53 PM Nirav Patel <[hidden email]> wrote:
Hi Prem, dropping column, renaming column are working for me as a workaround. I thought it just nice to have generic api that can handle that for me. or some intelligence that since both columns are same it shouldn't complain in subsequent Select clause that it doesn't know if I mean a#12 or a#81. They are both same just pick one.

On Thu, Jul 12, 2018 at 9:38 AM, Prem Sure <[hidden email]> wrote:
Hi Nirav, did you try 
.drop(df1(a) after join

Thanks,
Prem

On Thu, Jul 12, 2018 at 9:50 PM Nirav Patel <[hidden email]> wrote:
Hi Vamshi,

That api is very restricted and not generic enough. It imposes that all conditions of joins has to have same column on both side and it also has to be equijoin. It doesn't serve my usecase where some join predicates don't have same column names.

Thanks

On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla <[hidden email]> wrote:
Nirav,

Spark does not create a duplicate column when you use the below join expression,  as an array of column(s) like below but that requires the column name to be same in both the data frames.

Example: df1.join(df2, [‘a’])

Thanks.
Vamshi Talla

On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D <[hidden email]> wrote:

Nirav, 

withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. 



Thanks & Regards, 
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel <[hidden email]> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines both. Rename manually?



What's New with Xactly

        






What's New with Xactly

        




What's New with Xactly