Confuse on Spark to_date function

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Confuse on Spark to_date function

杨仲鲍

Code

```scala
object Suit {
case class Data(node:String,root:String)
def apply[A](xs:A *):List[A] = xs.toList
def main(args: Array[String]): Unit = {

val spark = SparkSession.builder()
.master("local")
.appName("MoneyBackTest")
.getOrCreate()
import spark.implicits._
spark.sql("select to_date('2020-01-01 20:00:00','yyyy-MM-dd HH:mm:ss')").show(false)
}
}
```

result

```output

+-----------------------------------------------------+
|to_date('2020-01-01 20:00:00', 'yyyy-MM-dd HH:mm:ss')|
+-----------------------------------------------------+
|2020-01-01                                           |
+-----------------------------------------------------+
```

Why not show 2020-01-01 20:00:00

sparkVersion:2.4.4
Device:MacBook
Reply | Threaded
Open this post in threaded view
|

Re: Confuse on Spark to_date function

Daniel Stojanov

On 5/11/20 2:48 pm, 杨仲鲍 wrote:

Code

```scala
object Suit {
  case class Data(node:String,root:String)
  def apply[A](xs:A *):List[A] = xs.toList
  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder()
      .master("local")
      .appName("MoneyBackTest")
      .getOrCreate()
    import spark.implicits._
    spark.sql("select to_date('2020-01-01 20:00:00','yyyy-MM-dd HH:mm:ss')").show(false)
  }
}
```

result

```output

+-----------------------------------------------------+
|to_date('2020-01-01 20:00:00', 'yyyy-MM-dd HH:mm:ss')|
+-----------------------------------------------------+
|2020-01-01                                           |
+-----------------------------------------------------+
```

Why not show 2020-01-01 20:00:00

sparkVersion:2.4.4
Device:MacBook

You want to_timestamp instead of to_date.

The following is in Python, but I think you should be able to follow.

>>> row = psq.Row(as_string="2020-01-01 12:01:02")

>>> df = spark.sparkContext.parallelize([row]).toDF()

>>> import pyspark.sql.functions as F

>>> df.withColumn("date_converted", F.to_date(F.column("as_string"), "yyyy-MM-dd HH:mm:ss")).show()

+-------------------+--------------+

| as_string|date_converted|

+-------------------+--------------+

|2020-01-01 12:01:02| 2020-01-01|

+-------------------+--------------+

>>> df.withColumn("date_converted", F.to_timestamp(F.column("as_string"), "yyyy-MM-dd HH:mm:ss")).show()

+-------------------+-------------------+

| as_string| date_converted|

+-------------------+-------------------+

|2020-01-01 12:01:02|2020-01-01 12:01:02|

+-------------------+-------------------+