Having access to spark results

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Having access to spark results

Affan Syed
Spark users, 
We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.

Here are a few options that come to mind

1) Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.

2) Spark results are pumped into a messaging queue from which a socket server like connection is made.

What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection?


Can someone with expertise help in bringing clarity. 

Thank you. 

Affan
Reply | Threaded
Open this post in threaded view
|

Re: [External Sender] Having access to spark results

Anthony, Olufemi
What sort of environment are you running Spark on - in the cloud, on premise ? Is its a real-time or batch oriented application?
Please provide more details.
Femi

On Thu, Oct 25, 2018 at 3:29 AM Affan Syed <[hidden email]> wrote:
Spark users, 
We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.

Here are a few options that come to mind

1) Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.

2) Spark results are pumped into a messaging queue from which a socket server like connection is made.

What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection?


Can someone with expertise help in bringing clarity. 

Thank you. 

Affan


The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Reply | Threaded
Open this post in threaded view
|

Fwd: Having access to spark results

onmstester onmstester-2
In reply to this post by Affan Syed
What about using cache() or save as a global temp table  for subsequent access?

Sent using Zoho Mail



============ Forwarded message ============
From : Affan Syed <[hidden email]>
To : "spark users"<[hidden email]>
Date : Thu, 25 Oct 2018 10:58:43 +0330
Subject : Having access to spark results
============ Forwarded message ============

Spark users, 
We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.

Here are a few options that come to mind

1) Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.

2) Spark results are pumped into a messaging queue from which a socket server like connection is made.

What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection?


Can someone with expertise help in bringing clarity. 

Thank you. 

Affan




Reply | Threaded
Open this post in threaded view
|

Re: [External Sender] Having access to spark results

Affan Syed
In reply to this post by Anthony, Olufemi
Femi,
We have a solution that needs to be both on-prem and also in the cloud. 

Not sure how that impacts anything, what we want is to run an analytical query on a large dataset (ours is over Cassandra) -- so batch in that sense, but think on-demand --- and then have the result be entirely (not first x number of rows) available for a web application to access the results. 

Web application work over a REST API, so while the query can be submitted through something like Livy or the thrift-server, the concern is how do we get the final result back to be useful. 

I could think of two ways of doing that. 

A  global temp table would work, but that would be first point --- it seems a bit involved. My point was that, has someone solved that problem and run through all the steps?


- Affan


On Thu, Oct 25, 2018 at 12:39 PM Femi Anthony <[hidden email]> wrote:
What sort of environment are you running Spark on - in the cloud, on premise ? Is its a real-time or batch oriented application?
Please provide more details.
Femi

On Thu, Oct 25, 2018 at 3:29 AM Affan Syed <[hidden email]> wrote:
Spark users, 
We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.

Here are a few options that come to mind

1) Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.

2) Spark results are pumped into a messaging queue from which a socket server like connection is made.

What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection?


Can someone with expertise help in bringing clarity. 

Thank you. 

Affan


The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.