SparkR integration with Hive 3 spark-r

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SparkR integration with Hive 3 spark-r

Alfredo Marquez
Hello,

Our company is moving to Hive 3, and they are saying that there is no SparkR implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?

If it is true, will this be addressed in the Spark 3 release?

I don't use python, so losing SparkR to get work done on Hadoop is a huge loss.

P.S. This is my first email to this community; if there is something I should do differently, please let me know.

Thank you

Alfredo
Reply | Threaded
Open this post in threaded view
|

Re: SparkR integration with Hive 3 spark-r

Nicolas Paris-2
Hi Alfredo

my 2 cents:
To my knowlegde and reading the spark3 pre-release note, it will handle
hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
tests on this in the past[1] and it seems to handle any hive metastore
version.

However spark cannot read hive managed table AKA transactional tables.
So I would say you should be able to read any hive 3 regular table with
any of spark, pyspark or sparkR.


[1] https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/

On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:

> Hello,
>
> Our company is moving to Hive 3, and they are saying that there is no SparkR
> implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?
>
> If it is true, will this be addressed in the Spark 3 release?
>
> I don't use python, so losing SparkR to get work done on Hadoop is a huge loss.
>
> P.S. This is my first email to this community; if there is something I should
> do differently, please let me know.
>
> Thank you
>
> Alfredo

--
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SparkR integration with Hive 3 spark-r

Alfredo Marquez
Hello Nicolas,

Well the issue is that with Hive 3, Spark gets it's own metastore, separate from the Hive 3 metastore.  So how do you reconcile this separation of metastores?

Can you continue to "enableHivemetastore" and be able to connect to Hive 3? Does this connection take advantage of Hive's LLAP?

Our team doesn't believe that it's possible to make the connection as you would in the past.  But if it is that simple, I would be ecstatic 😁.

Thanks,

Alfredo

On Mon, Nov 18, 2019, 12:53 PM Nicolas Paris <[hidden email]> wrote:
Hi Alfredo

my 2 cents:
To my knowlegde and reading the spark3 pre-release note, it will handle
hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
tests on this in the past[1] and it seems to handle any hive metastore
version.

However spark cannot read hive managed table AKA transactional tables.
So I would say you should be able to read any hive 3 regular table with
any of spark, pyspark or sparkR.


[1] https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/

On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:
> Hello,
>
> Our company is moving to Hive 3, and they are saying that there is no SparkR
> implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?
>
> If it is true, will this be addressed in the Spark 3 release?
>
> I don't use python, so losing SparkR to get work done on Hadoop is a huge loss.
>
> P.S. This is my first email to this community; if there is something I should
> do differently, please let me know.
>
> Thank you
>
> Alfredo

--
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SparkR integration with Hive 3 spark-r

Alfredo Marquez
Does anyone else have some insight to this question?

Thanks,

Alfredo

On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez <[hidden email]> wrote:
Hello Nicolas,

Well the issue is that with Hive 3, Spark gets it's own metastore, separate from the Hive 3 metastore.  So how do you reconcile this separation of metastores?

Can you continue to "enableHivemetastore" and be able to connect to Hive 3? Does this connection take advantage of Hive's LLAP?

Our team doesn't believe that it's possible to make the connection as you would in the past.  But if it is that simple, I would be ecstatic 😁.

Thanks,

Alfredo

On Mon, Nov 18, 2019, 12:53 PM Nicolas Paris <[hidden email]> wrote:
Hi Alfredo

my 2 cents:
To my knowlegde and reading the spark3 pre-release note, it will handle
hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
tests on this in the past[1] and it seems to handle any hive metastore
version.

However spark cannot read hive managed table AKA transactional tables.
So I would say you should be able to read any hive 3 regular table with
any of spark, pyspark or sparkR.


[1] https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/

On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:
> Hello,
>
> Our company is moving to Hive 3, and they are saying that there is no SparkR
> implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?
>
> If it is true, will this be addressed in the Spark 3 release?
>
> I don't use python, so losing SparkR to get work done on Hadoop is a huge loss.
>
> P.S. This is my first email to this community; if there is something I should
> do differently, please let me know.
>
> Thank you
>
> Alfredo

--
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SparkR integration with Hive 3 spark-r

Felix Cheung
I think you will get more answer if you ask without SparkR.

You question is independent on SparkR.

Spark support for Hive 3.x (3.1.2) was added here

https://github.com/apache/spark/commit/1b404b9b9928144e9f527ac7b1caa15f932c2649

You should be able to connect Spark to Hive metastore.




From: Alfredo Marquez <[hidden email]>
Sent: Friday, November 22, 2019 4:26:49 PM
To: [hidden email] <[hidden email]>
Subject: Re: SparkR integration with Hive 3 spark-r
 
Does anyone else have some insight to this question?

Thanks,

Alfredo

On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez <[hidden email]> wrote:
Hello Nicolas,

Well the issue is that with Hive 3, Spark gets it's own metastore, separate from the Hive 3 metastore.  So how do you reconcile this separation of metastores?

Can you continue to "enableHivemetastore" and be able to connect to Hive 3? Does this connection take advantage of Hive's LLAP?

Our team doesn't believe that it's possible to make the connection as you would in the past.  But if it is that simple, I would be ecstatic 😁.

Thanks,

Alfredo

On Mon, Nov 18, 2019, 12:53 PM Nicolas Paris <[hidden email]> wrote:
Hi Alfredo

my 2 cents:
To my knowlegde and reading the spark3 pre-release note, it will handle
hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
tests on this in the past[1] and it seems to handle any hive metastore
version.

However spark cannot read hive managed table AKA transactional tables.
So I would say you should be able to read any hive 3 regular table with
any of spark, pyspark or sparkR.


[1] https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/

On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:
> Hello,
>
> Our company is moving to Hive 3, and they are saying that there is no SparkR
> implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?
>
> If it is true, will this be addressed in the Spark 3 release?
>
> I don't use python, so losing SparkR to get work done on Hadoop is a huge loss.
>
> P.S. This is my first email to this community; if there is something I should
> do differently, please let me know.
>
> Thank you
>
> Alfredo

--
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]