Spark SQL - Truncate Day / Hour

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark SQL - Truncate Day / Hour

David Hodefi
I would like to truncate date to his day or hour. currently it is only possible to truncate MONTH or YEAR. 
1.How can achieve that? 
2.Is there any pull request about this issue? 
3.If there is not any open pull request about this issue, what are the implications that I should be aware of when coding /contributing it as a pull request?

Last question is,  Looking at DateTImeUtils class code, it seems like implementation is not using any open library for handling dates i.e apache-common , Why implementing it instead of reusing open source? 

Thanks David
Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL - Truncate Day / Hour

Gaspar Muñoz
There are functions for day (called dayOfMonth and dayOfYear) and hour (called hour). You can view them here: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions

Example:

import org.apache.spark.sql.functions._
val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"), dayOfYear($"myDateColumn"))

2017-11-09 12:05 GMT+01:00 David Hodefi <[hidden email]>:
I would like to truncate date to his day or hour. currently it is only possible to truncate MONTH or YEAR. 
1.How can achieve that? 
2.Is there any pull request about this issue? 
3.If there is not any open pull request about this issue, what are the implications that I should be aware of when coding /contributing it as a pull request?

Last question is,  Looking at DateTImeUtils class code, it seems like implementation is not using any open library for handling dates i.e apache-common , Why implementing it instead of reusing open source? 

Thanks David



--
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20828%206473" value="+34918286473" style="color:rgb(17,85,204)" target="_blank">+34 91 828 6473
Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL - Truncate Day / Hour

David Hodefi
I am familiar with those functions, none of them is actually truncating a date. We can use those methods to help implement truncate method. I think truncating a day/ hour should be as simple as "truncate(...,"DD")  or truncate(...,"HH")  ". 

On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz <[hidden email]> wrote:
There are functions for day (called dayOfMonth and dayOfYear) and hour (called hour). You can view them here: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions

Example:

import org.apache.spark.sql.functions._
val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"), dayOfYear($"myDateColumn"))

2017-11-09 12:05 GMT+01:00 David Hodefi <[hidden email]>:
I would like to truncate date to his day or hour. currently it is only possible to truncate MONTH or YEAR. 
1.How can achieve that? 
2.Is there any pull request about this issue? 
3.If there is not any open pull request about this issue, what are the implications that I should be aware of when coding /contributing it as a pull request?

Last question is,  Looking at DateTImeUtils class code, it seems like implementation is not using any open library for handling dates i.e apache-common , Why implementing it instead of reusing open source? 

Thanks David



--
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20828%206473" value="+34918286473" style="color:rgb(17,85,204)" target="_blank">+34 91 828 6473

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL - Truncate Day / Hour

Eike von Seggern
Hi,

you can truncate datetimes like this (in pyspark), e.g. to 5 minutes:

import pyspark.sql.functions as F
df.select((F.floor(F.col('myDateColumn').cast('long') / 300) * 300).cast('timestamp'))

Best,
Eike

David Hodefi <[hidden email]> schrieb am Mo., 13. Nov. 2017 um 12:27 Uhr:
I am familiar with those functions, none of them is actually truncating a date. We can use those methods to help implement truncate method. I think truncating a day/ hour should be as simple as "truncate(...,"DD")  or truncate(...,"HH")  ". 

On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz <[hidden email]> wrote:
There are functions for day (called dayOfMonth and dayOfYear) and hour (called hour). You can view them here: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions

Example:

import org.apache.spark.sql.functions._
val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"), dayOfYear($"myDateColumn"))

2017-11-09 12:05 GMT+01:00 David Hodefi <[hidden email]>:
I would like to truncate date to his day or hour. currently it is only possible to truncate MONTH or YEAR. 
1.How can achieve that? 
2.Is there any pull request about this issue? 
3.If there is not any open pull request about this issue, what are the implications that I should be aware of when coding /contributing it as a pull request?

Last question is,  Looking at DateTImeUtils class code, it seems like implementation is not using any open library for handling dates i.e apache-common , Why implementing it instead of reusing open source? 

Thanks David



--
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20828%206473" value="+34918286473" style="color:rgb(17,85,204)" target="_blank">+34 91 828 6473