|
|
Hi All -
I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which is connected to an external hive metastore.
I run the below set of commands :-
val tableName = tblname_2 spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta options(path='GCS_PATH')") 20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `default`.`tblname_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9") res51: org.apache.spark.sql.DataFrame = []
spark.sql(s"SELECT * FROM $tableName").show() org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_2;
I see a warning which is related to integration with Hive Metastore which essentially tells that this table cannot be queried via Hive or Presto which is fine but when I try to read the data from the same spark session I am getting an error. Can someone suggest what can be the problem ?
|
|
Hi Jay,
Some things to check:
Do you have the following set in your Spark SQL config:
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
Is the JAR for the package delta-core_2.12:0.7.0 available on both your driver and executor classpaths?
Since you are using non-default metastore version have you set the config for spark.sql.hive.metastore.version
Finally are you able to read/write Delta tables outside of Hive?
-Matt Hi All -
I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which is connected to an external hive metastore.
I run the below set of commands :-
val tableName = tblname_2 spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta options(path='GCS_PATH')") 20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `default`.`tblname_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9") res51: org.apache.spark.sql.DataFrame = []
spark.sql(s"SELECT * FROM $tableName").show() org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_2;
I see a warning which is related to integration with Hive Metastore which essentially tells that this table cannot be queried via Hive or Presto which is fine but when I try to read the data from the same spark session I am getting an error. Can someone suggest what can be the problem ?
|
|
Thanks Matt.
I have set the two configs in my sparkConfig as below val spark = SparkSession.builder().appName("QuickstartSQL").config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension").config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog").getOrCreate()
I am using a managed Spark service with Delta on Google Cloud and therefore all nodes have delta-core_2.12:0.7.0 in /usr/lib/delta/jars/
I am using a managed Hive metastore version 2.3.6 which is connected to my Delta cluster as well as presto cluster. When I am using the normal scala API it works without any issues
spark.read.format("delta").load("pathtoTable").show()
val deltaTable = DeltaTable.forPath("gs://jayadeep-etl-platform/first-delta-table")
deltaTable.as("oldData").merge(merge_df.as("newData"),"oldData.x = newData.x").whenMatched.update(Map("y" -> col("newData.y"))).whenNotMatched.insert(Map("x" -> col("newData.x"))).execute()
When I am issuing the following command it works fine scala> spark.sql(s"SELECT * FROM $tableName") res2: org.apache.spark.sql.DataFrame = [col1: int]
But when I try to do .show() it returns an error scala> spark.sql(s"SELECT * FROM $tableName").show() org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_3;
-Jay
Hi Jay,
Some things to check:
Do you have the following set in your Spark SQL config:
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
Is the JAR for the package delta-core_2.12:0.7.0 available on both your driver and executor classpaths?
Since you are using non-default metastore version have you set the config for spark.sql.hive.metastore.version
Finally are you able to read/write Delta tables outside of Hive?
-Matt Hi All -
I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which is connected to an external hive metastore.
I run the below set of commands :-
val tableName = tblname_2 spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta options(path='GCS_PATH')") 20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `default`.`tblname_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9") res51: org.apache.spark.sql.DataFrame = []
spark.sql(s"SELECT * FROM $tableName").show() org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_2;
I see a warning which is related to integration with Hive Metastore which essentially tells that this table cannot be queried via Hive or Presto which is fine but when I try to read the data from the same spark session I am getting an error. Can someone suggest what can be the problem ?
|
|
I think I found the issue, Hive metastore 2.3.6 doesn't have the necessary support. After upgrading to Hive 3.1.2 I was able to run the select query.
Thanks Matt.
I have set the two configs in my sparkConfig as below val spark = SparkSession.builder().appName("QuickstartSQL").config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension").config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog").getOrCreate()
I am using a managed Spark service with Delta on Google Cloud and therefore all nodes have delta-core_2.12:0.7.0 in /usr/lib/delta/jars/
I am using a managed Hive metastore version 2.3.6 which is connected to my Delta cluster as well as presto cluster. When I am using the normal scala API it works without any issues
spark.read.format("delta").load("pathtoTable").show()
val deltaTable = DeltaTable.forPath("gs://jayadeep-etl-platform/first-delta-table")
deltaTable.as("oldData").merge(merge_df.as("newData"),"oldData.x = newData.x").whenMatched.update(Map("y" -> col("newData.y"))).whenNotMatched.insert(Map("x" -> col("newData.x"))).execute()
When I am issuing the following command it works fine scala> spark.sql(s"SELECT * FROM $tableName") res2: org.apache.spark.sql.DataFrame = [col1: int]
But when I try to do .show() it returns an error scala> spark.sql(s"SELECT * FROM $tableName").show() org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_3;
-Jay
Hi Jay,
Some things to check:
Do you have the following set in your Spark SQL config:
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
Is the JAR for the package delta-core_2.12:0.7.0 available on both your driver and executor classpaths?
Since you are using non-default metastore version have you set the config for spark.sql.hive.metastore.version
Finally are you able to read/write Delta tables outside of Hive?
-Matt Hi All -
I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which is connected to an external hive metastore.
I run the below set of commands :-
val tableName = tblname_2 spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta options(path='GCS_PATH')") 20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `default`.`tblname_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9") res51: org.apache.spark.sql.DataFrame = []
spark.sql(s"SELECT * FROM $tableName").show() org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_2;
I see a warning which is related to integration with Hive Metastore which essentially tells that this table cannot be queried via Hive or Presto which is fine but when I try to read the data from the same spark session I am getting an error. Can someone suggest what can be the problem ?
|
|