[Spark 3.0.0] Job fails with NPE - worked in Spark 2.4.4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Spark 3.0.0] Job fails with NPE - worked in Spark 2.4.4

Neelesh Salian
Hi folks,


Context:
The application (Pyspark):
1. Read a Hive table from the Metastore (Running Hive 1.2.2)
2. Print schema of the Dataframe read.
3. Do a show() on the df captured. The above error stack trace is from the show job.

***********************************
To Reproduce:
df = spark_ses.sql('select * from <db>.<table> limit 100000')

print("Printing Schema \n")
df.printSchema()

print("Running Show \n")
df.show(100)

spark_ses.stop()

***********************************

Build Profile and additional application info:
1. Spark 3.0.0 binary built using
./dev/make-distribution.sh -Pyarn -Phive -Phive-thriftserver -Dhadoop.version=2.8.5  -Pspark-ganglia-lgpl -Pscala-2.12 -Dscala-2.12
This build is from the Spark github repo from this commit
2. Hive 1.2.2 as the metastore. The Spark application can connect to the metastore.
3. I can do a printSchema() on the df, and it does print out the schema correctly. But a show or attempts to write to a S3 data store fails with the above error.

Any advice on how I can go about debugging/ solving this?


--
Regards,
Neelesh S. Salian