Spark Dataset withColumn issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Dataset withColumn issue

Vikas Garg
In Spark Datase, if we add additional column using 
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Reply | Threaded
Open this post in threaded view
|

Re: Spark Dataset withColumn issue

German Schiavon Matteo
ds.select("Col1", "Col2", "Col3")

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <[hidden email]> wrote:
In Spark Datase, if we add additional column using 
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Reply | Threaded
Open this post in threaded view
|

Re: Spark Dataset withColumn issue

Vikas Garg
I am deriving the col2 using with colunn which is why I cant use it like you told me

On Thu, Nov 12, 2020, 20:11 German Schiavon <[hidden email]> wrote:
ds.select("Col1", "Col2", "Col3")

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <[hidden email]> wrote:
In Spark Datase, if we add additional column using 
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Reply | Threaded
Open this post in threaded view
|

Re: Spark Dataset withColumn issue

srowen
You can still simply select the columns by name in order, after .withColumn()

On Thu, Nov 12, 2020 at 9:49 AM Vikas Garg <[hidden email]> wrote:
I am deriving the col2 using with colunn which is why I cant use it like you told me

On Thu, Nov 12, 2020, 20:11 German Schiavon <[hidden email]> wrote:
ds.select("Col1", "Col2", "Col3")

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <[hidden email]> wrote:
In Spark Datase, if we add additional column using 
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Reply | Threaded
Open this post in threaded view
|

Re: Spark Dataset withColumn issue

Subash Prabakar
In reply to this post by Vikas Garg
Hi Vikas,

He suggested to use the select() function after your withColumn function.

val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample”)).select(“Col1”, “Col2”, “Col3")


Thanks,
Subash

On Thu, Nov 12, 2020 at 9:19 PM Vikas Garg <[hidden email]> wrote:
I am deriving the col2 using with colunn which is why I cant use it like you told me

On Thu, Nov 12, 2020, 20:11 German Schiavon <[hidden email]> wrote:
ds.select("Col1", "Col2", "Col3")

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <[hidden email]> wrote:
In Spark Datase, if we add additional column using 
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Reply | Threaded
Open this post in threaded view
|

Re: Spark Dataset withColumn issue

Vikas Garg
Ohhkkkk

Thanks a lot

On Thu, Nov 12, 2020, 21:23 Subash Prabakar <[hidden email]> wrote:
Hi Vikas,

He suggested to use the select() function after your withColumn function.

val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample”)).select(“Col1”, “Col2”, “Col3")


Thanks,
Subash

On Thu, Nov 12, 2020 at 9:19 PM Vikas Garg <[hidden email]> wrote:
I am deriving the col2 using with colunn which is why I cant use it like you told me

On Thu, Nov 12, 2020, 20:11 German Schiavon <[hidden email]> wrote:
ds.select("Col1", "Col2", "Col3")

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <[hidden email]> wrote:
In Spark Datase, if we add additional column using 
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Reply | Threaded
Open this post in threaded view
|

Re: Spark Dataset withColumn issue

Lalwani, Jayesh

Note that Spark never guarantees ordering of columns. There’s nothing in Spark documentation that says that the columns will be ordered a certain way. The proposed solution relies on an implementation detail that might change in future version of Spark.

 

Ideally, you shouldn’t rely on Dataframe to maintain order of columns. The question is why do you care about ordering of cols? If order of data is important, then you should put it in an array

 

From: Vikas Garg <[hidden email]>
Date: Thursday, November 12, 2020 at 12:40 PM
To: Subash Prabakar <[hidden email]>
Cc: German Schiavon <[hidden email]>, User <[hidden email]>
Subject: RE: [EXTERNAL] Spark Dataset withColumn issue

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Ohhkkkk

 

Thanks a lot

 

On Thu, Nov 12, 2020, 21:23 Subash Prabakar <[hidden email]> wrote:

Hi Vikas,

 

He suggested to use the select() function after your withColumn function.

 

val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample”)).select(“Col1”, “Col2”, “Col3")

 

 

Thanks,

Subash

 

On Thu, Nov 12, 2020 at 9:19 PM Vikas Garg <[hidden email]> wrote:

I am deriving the col2 using with colunn which is why I cant use it like you told me

 

On Thu, Nov 12, 2020, 20:11 German Schiavon <[hidden email]> wrote:

ds.select("Col1", "Col2", "Col3")

 

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <[hidden email]> wrote:

In Spark Datase, if we add additional column using 

withColumn

then the column is added in the last.

 

e.g.

val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

 

the the order of columns is >> Col1  |  Col3  |  Col2

 

I want the order to be  >> Col1  |  Col2  |  Col3

 

How can I achieve this?