[Spark SQL] Couldn't save dataframe with null columns to S3.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Spark SQL] Couldn't save dataframe with null columns to S3.

ehbhaskar
This post was updated on .
I have a spark job that writes data to S3 as below.

source_data_df_to_write.select(target_columns_list) \
.write.partitionBy(target_partition_cols_list) \
.format("ORC").save(self.table_location_prefix + self.target_table,
mode="append")

My dataframe some times can have null values for columns. Writing dataframe with null attributes fails my job stating IllegalArgumentException as below.

Caused by: java.lang.IllegalArgumentException: Error: type expected at the
position 14 of 'double:string:null:string:string:string:double:bigint:null:null:null:null:string:null:string:null:null:null:null:string:string:string:null:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string'
but 'null' is found.



Sample dataframe looks like this:

columns_with_default = "col1, NULL as col2, col2, col4, NULL as col5,
partition_col1, partition_col2"
source_data_df_to_write = self.session.sql(
                 "SELECT %s FROM TEMP_VIEW" % (columns_with_default))


So, is there a way to make spark job to write dataframe with NULL attributes to S3?