Spark 3.0.1 not connecting with Hive 2.1.1

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 3.0.1 not connecting with Hive 2.1.1

Pradyumn Agrawal
Hi All,
I am facing an issue while connecting Spark 3.0.1 with Hive 2.1.1 version (to be precise Hive 2.1.1-cdh6.3.1 - CDH distribution).
Invoking SQL query through spark-sql and spark-shell dont logs any error on stderr and stdout and the application just stucks.

Is there any minimum requirement for the Hive version to run with Spark 3.x distribution?

Please advise.

Thanks
Pradyumn Agrawal
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

michael.yang
Hi Pradyumn,

We integrated Spark 3.0.1 with hive 2.1.1-cdh6.1.0 and it works fine to use
spark-sql to query hive tables.

Make sure you config spark-defaults.conf and spark-env.sh well and copy
hive/hadoop related config files to spark conf folder.

You can refer to below refrences for detail.

https://spark.apache.org/docs/latest/building-spark.html
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
https://blog.csdn.net/Young2018/article/details/108871542

Thanks
Michael Yang



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

Pradyumn Agrawal
Hi Michael,
Thanks for references, although I had a hard time translating the 3rd one as Google Translate of the csdn blog, it didn't work correctly and already went through the 1st and 2nd earlier.
But I can see the CDH distribution is different in my case, it is CDH-6.3.1.

image.png

As you can see here in the screenshot, it is saying that Invalid Methods Name: get_table_req
I am guessing that CDH Distribution has some changes on Hive Metastore Client which is conflicting with Shim implementations of Spark. Although, I couldn't debug a lot, it's totally a guesswork.
Would certainly like to know your and other views on this?

Thanks and Regards
Pradyumn Agrawal
Media.net (India)

On Sat, Jan 9, 2021 at 8:01 PM michael.yang <[hidden email]> wrote:
Hi Pradyumn,

We integrated Spark 3.0.1 with hive 2.1.1-cdh6.1.0 and it works fine to use
spark-sql to query hive tables.

Make sure you config spark-defaults.conf and spark-env.sh well and copy
hive/hadoop related config files to spark conf folder.

You can refer to below refrences for detail.

https://spark.apache.org/docs/latest/building-spark.html
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
https://blog.csdn.net/Young2018/article/details/108871542

Thanks
Michael Yang



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

DB Tsai-5
Hi Pradyumn,

I think it’s because of a HMS client backward compatibility issue described here, https://issues.apache.org/jira/browse/HIVE-24608

Thanks,

DB Tsai | ACI Spark Core |  Apple, Inc

On Jan 9, 2021, at 9:53 AM, Pradyumn Agrawal <[hidden email]> wrote:

Hi Michael,
Thanks for references, although I had a hard time translating the 3rd one as Google Translate of the csdn blog, it didn't work correctly and already went through the 1st and 2nd earlier.
But I can see the CDH distribution is different in my case, it is CDH-6.3.1.

<image.png>

As you can see here in the screenshot, it is saying that Invalid Methods Name: get_table_req
I am guessing that CDH Distribution has some changes on Hive Metastore Client which is conflicting with Shim implementations of Spark. Although, I couldn't debug a lot, it's totally a guesswork.
Would certainly like to know your and other views on this?

Thanks and Regards
Pradyumn Agrawal
Media.net (India)

On Sat, Jan 9, 2021 at 8:01 PM michael.yang <[hidden email]> wrote:
Hi Pradyumn,

We integrated Spark 3.0.1 with hive 2.1.1-cdh6.1.0 and it works fine to use
spark-sql to query hive tables.

Make sure you config spark-defaults.conf and spark-env.sh well and copy
hive/hadoop related config files to spark conf folder.

You can refer to below refrences for detail.

https://spark.apache.org/docs/latest/building-spark.html
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
https://blog.csdn.net/Young2018/article/details/108871542

Thanks
Michael Yang



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

Pradyumn Agrawal
Hi DB Tsai,

Thanks for the JIRA link. I think this blocks me to the Hive end instead of Spark.

Regards
Pradyumn Agrawal
Media.net (India)

On Sun, Jan 10, 2021 at 10:43 AM DB Tsai <[hidden email]> wrote:
Hi Pradyumn,

I think it’s because of a HMS client backward compatibility issue described here, https://issues.apache.org/jira/browse/HIVE-24608

Thanks,

DB Tsai | ACI Spark Core |  Apple, Inc

On Jan 9, 2021, at 9:53 AM, Pradyumn Agrawal <[hidden email]> wrote:

Hi Michael,
Thanks for references, although I had a hard time translating the 3rd one as Google Translate of the csdn blog, it didn't work correctly and already went through the 1st and 2nd earlier.
But I can see the CDH distribution is different in my case, it is CDH-6.3.1.

<image.png>

As you can see here in the screenshot, it is saying that Invalid Methods Name: get_table_req
I am guessing that CDH Distribution has some changes on Hive Metastore Client which is conflicting with Shim implementations of Spark. Although, I couldn't debug a lot, it's totally a guesswork.
Would certainly like to know your and other views on this?

Thanks and Regards
Pradyumn Agrawal
Media.net (India)

On Sat, Jan 9, 2021 at 8:01 PM michael.yang <[hidden email]> wrote:
Hi Pradyumn,

We integrated Spark 3.0.1 with hive 2.1.1-cdh6.1.0 and it works fine to use
spark-sql to query hive tables.

Make sure you config spark-defaults.conf and spark-env.sh well and copy
hive/hadoop related config files to spark conf folder.

You can refer to below refrences for detail.

https://spark.apache.org/docs/latest/building-spark.html
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
https://blog.csdn.net/Young2018/article/details/108871542

Thanks
Michael Yang



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

michael.yang
In reply to this post by Pradyumn Agrawal
Hi Pradyumn,

It seems you did not configure spark-default.conf file well.
Below configurations are needed to use hive 2.1.1 as metastore and execution
engine.

spark.sql.hive.metastore.version=2.1.1
spark.sql.hive.metastore.jars=/opt/cloudera/parcels/CDH/lib/hive/lib/*

Thanks.
Michael Yang



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

Pradyumn Agrawal
Hi Michael,

Sure will give it a try once more.

Regards
Pradyumn Agrawal
Media.net (India)

On Sun, Jan 10, 2021 at 9:35 PM michael.yang <[hidden email]> wrote:
Hi Pradyumn,

It seems you did not configure spark-default.conf file well.
Below configurations are needed to use hive 2.1.1 as metastore and execution
engine.

spark.sql.hive.metastore.version=2.1.1
spark.sql.hive.metastore.jars=/opt/cloudera/parcels/CDH/lib/hive/lib/*

Thanks.
Michael Yang



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0.1 not connecting with Hive 2.1.1

djdillon
In reply to this post by michael.yang
I am having the a similar problem to the original post. I am trying to run in
Spark 3 and connect to CDH Hive 2.1.1. I have run with the same
spark.sql.hive.metastore options. The main difference in my environment is
that I am trying to run in Spark on K8S using spark-operator, making it a
little more difficult to control and debug. Depending on the different
configurations I get either the 'get_table_req' problem or
 


Currently my base image was created using:

I do see the hive jars despite the hadoop-provided profile, so have been
trying to different configurations of my jars including or omitting them and
compiling against 2.3.7 to match Spark default.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]