[Pyspark 3 Debug] Date values reset to Unix epoch

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Pyspark 3 Debug] Date values reset to Unix epoch

Andrew Mullins
I am encountering a bug with a broken unit test - it passes on Pyspark 2.4.4
but fails on Pyspark 3.0. I've managed to create a minimal reproducible
example of the issue.

The following code:


Returns the following on Pyspark 3:


On Pyspark 2.4.4, the final table has the correct date value.

Does anyone have any ideas what might be causing this?

Best,
Andrew Mullins



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Pyspark 3 Debug] Date values reset to Unix epoch

EveLiao
I can't see your code and return values. Can you post them again?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Pyspark 3 Debug] Date values reset to Unix epoch

Andrew Mullins
In reply to this post by Andrew Mullins
My apologies, my code sections were eaten.

Code:
import datetime as dt
import pyspark

def get_spark():
    return
pyspark.sql.SparkSession.builder.enableHiveSupport().getOrCreate()

if __name__ == '__main__':
    spark = get_spark()
    table = spark.createDataFrame(
        [("1234", dt.date(2020, 5, 25))],
        ["id", "date"]
    )
    table.coalesce(1).createOrReplaceTempView("test_date")
    spark.sql("CREATE DATABASE IF NOT EXISTS test_db")
    spark.sql(
        "CREATE TABLE IF NOT EXISTS "
        "test_db.test_date "
        "SELECT * FROM test_date"
    )

    print("Temp Table:")
    print(spark.table("test_date").collect())
    print("Final Table:")
    print(spark.table("test_db.test_date").collect())


Output:
Temp Table:
[Row(id='1234', date=datetime.date(2020, 5, 25))]
Final Table:
[Row(id='1234', date=datetime.date(1970, 1, 1))]

Best,
Andrew



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]