Pyspark error when converting string to timestamp in map function

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Pyspark error when converting string to timestamp in map function

Keith Chapman
Hi all,

I'm trying to create a dataframe enforcing a schema so that I can write it to a parquet file. The schema has timestamps and I get an error with pyspark. The following is a snippet of code that exhibits the problem,

df = sqlctx.range(1000)
schema = StructType([StructField('a', TimestampType(), True)])
df1 = sqlctx.createDataFrame(df.rdd.map(row_gen_func), schema)

row_gen_func is a function that retruns timestamp strings of the form "2018-03-21 11:09:44"

When I compile this with Spark 2.2 I get the following error,

raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj)))
TypeError: TimestampType can not accept object '2018-03-21 08:06:17' in type <type 'str'>