SQL on Spark - Shark or SparkSQL

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SQL on Spark - Shark or SparkSQL

Manoj Samel
Hi,


If there is no existing investment in Hive/Shark, would it be worth starting a new SQL work using SparkSQL rather than Shark ?

* It seems Shark SQL core will use more and more of SparkSQL
* From the blog, it seems Shark has baggage from Hive, that is not needed in this case

On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan of blog and doc) 

* SparkSQL will have less features than Shark/Hive QL, at least for now.
* The standalone SharkServer feature will not be available in SparkSQL.

Can someone from Databricks shed light on what is the long term roadmap? It will help in avoiding investing in older/two technologies for work with no Hive needs.

Thanks,

PS: Great work on SparkSQL

Reply | Threaded
Open this post in threaded view
|

Re: SQL on Spark - Shark or SparkSQL

Nick Chammas
This is a great question. We are in the same position, having not invested in Hive yet and looking at various options for SQL-on-Hadoop.


On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <[hidden email]> wrote:
Hi,


If there is no existing investment in Hive/Shark, would it be worth starting a new SQL work using SparkSQL rather than Shark ?

* It seems Shark SQL core will use more and more of SparkSQL
* From the blog, it seems Shark has baggage from Hive, that is not needed in this case

On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan of blog and doc) 

* SparkSQL will have less features than Shark/Hive QL, at least for now.
* The standalone SharkServer feature will not be available in SparkSQL.

Can someone from Databricks shed light on what is the long term roadmap? It will help in avoiding investing in older/two technologies for work with no Hive needs.

Thanks,

PS: Great work on SparkSQL


Reply | Threaded
Open this post in threaded view
|

Re: SQL on Spark - Shark or SparkSQL

Mayur Rustagi
+1 Have done a few installations of Shark with customers using Hive, they love it. Would be good to maintain compatibility with Metastore & QL till we have substantial reason to break off (like BlinkDB). 

Mayur Rustagi
Ph: +1 (760) 203 3257


On Sun, Mar 30, 2014 at 2:46 AM, Nicholas Chammas <[hidden email]> wrote:
This is a great question. We are in the same position, having not invested in Hive yet and looking at various options for SQL-on-Hadoop.


On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <[hidden email]> wrote:
Hi,


If there is no existing investment in Hive/Shark, would it be worth starting a new SQL work using SparkSQL rather than Shark ?

* It seems Shark SQL core will use more and more of SparkSQL
* From the blog, it seems Shark has baggage from Hive, that is not needed in this case

On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan of blog and doc) 

* SparkSQL will have less features than Shark/Hive QL, at least for now.
* The standalone SharkServer feature will not be available in SparkSQL.

Can someone from Databricks shed light on what is the long term roadmap? It will help in avoiding investing in older/two technologies for work with no Hive needs.

Thanks,

PS: Great work on SparkSQL



Reply | Threaded
Open this post in threaded view
|

Re: [shark-users] SQL on Spark - Shark or SparkSQL

Matei Zaharia
Administrator
Hi Manoj,

At the current time, for drop-in replacement of Hive, it will be best to stick with Shark. Over time, Shark will use the Spark SQL backend, but should remain deployable the way it is today (including launching the SharkServer, using the Hive CLI, etc). Spark SQL is better for accessing Hive data within a Spark program though, where its APIs are richer and easier to link to than the SharkContext.sql2rdd we had previously provided in Shark.

So in a nutshell, if you have a Shark deployment today, or need the HiveServer, then going with Shark will be fine and we will switch out the backend in a future release (we’ll probably create preview of this even before we’re ready to fully switch). If you just want to run SQL queries or load SQL data within a Spark program, try out Spark SQL.

Matei

On Mar 30, 2014, at 4:46 PM, Mayur Rustagi <[hidden email]> wrote:

+1 Have done a few installations of Shark with customers using Hive, they love it. Would be good to maintain compatibility with Metastore & QL till we have substantial reason to break off (like BlinkDB). 

Mayur Rustagi
Ph: +1 (760) 203 3257


On Sun, Mar 30, 2014 at 2:46 AM, Nicholas Chammas <[hidden email]> wrote:
This is a great question. We are in the same position, having not invested in Hive yet and looking at various options for SQL-on-Hadoop.


On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <[hidden email]> wrote:
Hi,


If there is no existing investment in Hive/Shark, would it be worth starting a new SQL work using SparkSQL rather than Shark ?

* It seems Shark SQL core will use more and more of SparkSQL
* From the blog, it seems Shark has baggage from Hive, that is not needed in this case

On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan of blog and doc) 

* SparkSQL will have less features than Shark/Hive QL, at least for now.
* The standalone SharkServer feature will not be available in SparkSQL.

Can someone from Databricks shed light on what is the long term roadmap? It will help in avoiding investing in older/two technologies for work with no Hive needs.

Thanks,

PS: Great work on SparkSQL




--
You received this message because you are subscribed to the Google Groups "shark-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/shark-users.
For more options, visit https://groups.google.com/d/optout.

Reply | Threaded
Open this post in threaded view
|

Re: [shark-users] SQL on Spark - Shark or SparkSQL

Manoj Samel
Thanks Matei,

Any thoughts of providing Standalone SharkServer equivalent on SparkSQL?

Manoj


On Sun, Mar 30, 2014 at 7:35 PM, Matei Zaharia <[hidden email]> wrote:
Hi Manoj,

At the current time, for drop-in replacement of Hive, it will be best to stick with Shark. Over time, Shark will use the Spark SQL backend, but should remain deployable the way it is today (including launching the SharkServer, using the Hive CLI, etc). Spark SQL is better for accessing Hive data within a Spark program though, where its APIs are richer and easier to link to than the SharkContext.sql2rdd we had previously provided in Shark.

So in a nutshell, if you have a Shark deployment today, or need the HiveServer, then going with Shark will be fine and we will switch out the backend in a future release (we’ll probably create preview of this even before we’re ready to fully switch). If you just want to run SQL queries or load SQL data within a Spark program, try out Spark SQL.

Matei

On Mar 30, 2014, at 4:46 PM, Mayur Rustagi <[hidden email]> wrote:

+1 Have done a few installations of Shark with customers using Hive, they love it. Would be good to maintain compatibility with Metastore & QL till we have substantial reason to break off (like BlinkDB). 

Mayur Rustagi
Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257" target="_blank">+1 (760) 203 3257


On Sun, Mar 30, 2014 at 2:46 AM, Nicholas Chammas <[hidden email]> wrote:
This is a great question. We are in the same position, having not invested in Hive yet and looking at various options for SQL-on-Hadoop.


On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <[hidden email]> wrote:
Hi,


If there is no existing investment in Hive/Shark, would it be worth starting a new SQL work using SparkSQL rather than Shark ?

* It seems Shark SQL core will use more and more of SparkSQL
* From the blog, it seems Shark has baggage from Hive, that is not needed in this case

On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan of blog and doc) 

* SparkSQL will have less features than Shark/Hive QL, at least for now.
* The standalone SharkServer feature will not be available in SparkSQL.

Can someone from Databricks shed light on what is the long term roadmap? It will help in avoiding investing in older/two technologies for work with no Hive needs.

Thanks,

PS: Great work on SparkSQL




--
You received this message because you are subscribed to the Google Groups "shark-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/shark-users.
For more options, visit https://groups.google.com/d/optout.


Reply | Threaded
Open this post in threaded view
|

Re: [shark-users] SQL on Spark - Shark or SparkSQL

MLnick
It shouldn't be too tricky to use the Spark job server to create a job where the SQL statement is an input argument, which is executed and the result returned. This gives remote server execution but no metastore layer

Sent from Mailbox for iPhone


On Mon, Mar 31, 2014 at 6:56 AM, Manoj Samel <[hidden email]> wrote:

Thanks Matei,

Any thoughts of providing Standalone SharkServer equivalent on SparkSQL?

Manoj


On Sun, Mar 30, 2014 at 7:35 PM, Matei Zaharia <[hidden email]> wrote:
Hi Manoj,

At the current time, for drop-in replacement of Hive, it will be best to stick with Shark. Over time, Shark will use the Spark SQL backend, but should remain deployable the way it is today (including launching the SharkServer, using the Hive CLI, etc). Spark SQL is better for accessing Hive data within a Spark program though, where its APIs are richer and easier to link to than the SharkContext.sql2rdd we had previously provided in Shark.

So in a nutshell, if you have a Shark deployment today, or need the HiveServer, then going with Shark will be fine and we will switch out the backend in a future release (we’ll probably create preview of this even before we’re ready to fully switch). If you just want to run SQL queries or load SQL data within a Spark program, try out Spark SQL.

Matei

On Mar 30, 2014, at 4:46 PM, Mayur Rustagi <[hidden email]> wrote:

+1 Have done a few installations of Shark with customers using Hive, they love it. Would be good to maintain compatibility with Metastore & QL till we have substantial reason to break off (like BlinkDB). 

Mayur Rustagi
Ph: <a href="tel:%2B1%20%28760%29%20203%203257">+1 (760) 203 3257


On Sun, Mar 30, 2014 at 2:46 AM, Nicholas Chammas <[hidden email]> wrote:
This is a great question. We are in the same position, having not invested in Hive yet and looking at various options for SQL-on-Hadoop.


On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <[hidden email]> wrote:
Hi,


If there is no existing investment in Hive/Shark, would it be worth starting a new SQL work using SparkSQL rather than Shark ?

* It seems Shark SQL core will use more and more of SparkSQL
* From the blog, it seems Shark has baggage from Hive, that is not needed in this case

On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan of blog and doc) 

* SparkSQL will have less features than Shark/Hive QL, at least for now.
* The standalone SharkServer feature will not be available in SparkSQL.

Can someone from Databricks shed light on what is the long term roadmap? It will help in avoiding investing in older/two technologies for work with no Hive needs.

Thanks,

PS: Great work on SparkSQL




--
You received this message because you are subscribed to the Google Groups "shark-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/shark-users.
For more options, visit https://groups.google.com/d/optout.