JdbcRDD - schema always resolved as nullable=true

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

JdbcRDD - schema always resolved as nullable=true

Subhash Sriram
Hi Spark Users,

We do a lot of processing in Spark using data that is in MS SQL server. Today, I created a DataFrame against a table in SQL Server using the following:

val dfSql=spark.read.jdbc(connectionString, table, props)

I noticed that every column in the DataFrame showed as nullable=true, even though many of them are required.

I went hunting in the code, and I found that in JDBCRDD, when it resolves the schema of a table, it passes in alwaysNullable=true to JdbcUtils, which forces all columns to resolve as nullable.

I don't see a way to change that functionality. Is this by design, or could it be a bug?