[Spark Java] Add new column in DataSet based on existed column

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark Java] Add new column in DataSet based on existed column

JF Chen
I am working on adding a date transformed field on existed dataset.

The current dataset contains a column named timestamp in ISO format. I want to parse this field to joda time type, and then extract the year, month, day, hour info as new column attaching to original dataset.
I have tried df.withColumn function, but it seems only support simple expression rather than customized function like MapFunction.
How to solve it? 

Thanks!



Regard,
Junfeng Chen
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Java] Add new column in DataSet based on existed column

Divya Gehlot
Hi ,

Here is example snippet in scala 
// Convert to a Date type
val timestamp2datetype: (Column) => Column = (x) => { to_date(x) }
df = df.withColumn("date", timestamp2datetype(col("end_date")))
Hope this helps !
Thanks,
Divya 


On 28 March 2018 at 15:16, Junfeng Chen <[hidden email]> wrote:
I am working on adding a date transformed field on existed dataset.

The current dataset contains a column named timestamp in ISO format. I want to parse this field to joda time type, and then extract the year, month, day, hour info as new column attaching to original dataset.
I have tried df.withColumn function, but it seems only support simple expression rather than customized function like MapFunction.
How to solve it? 

Thanks!



Regard,
Junfeng Chen