Do GraphFrames support streaming?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Do GraphFrames support streaming?

kant kodali
Do GraphFrames support streaming?
Reply | Threaded
Open this post in threaded view
|

Re: Do GraphFrames support streaming?

Jörn Franke
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware

> On 29. Apr 2018, at 11:43, kant kodali <[hidden email]> wrote:
>
> Do GraphFrames support streaming?

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Do GraphFrames support streaming?

kant kodali
"You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware"

I want to update the graph incrementally and want to run some graph queries similar to Cypher like give me all the vertices that are connected by a specific set of edges and so on. Don't really intend to run graph algorithms like ConnectedComponents or anything else at this point but of course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke <[hidden email]> wrote:
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware

> On 29. Apr 2018, at 11:43, kant kodali <[hidden email]> wrote:
>
> Do GraphFrames support streaming?

Reply | Threaded
Open this post in threaded view
|

Re: Do GraphFrames support streaming?

Jörn Franke
For your use case one might indeed be able to work simply with incremental graph updates. However they are not straight forward in Spark. You can union the new Data with the existing dataframes that represent your graph and create from that a new graph frame.

However I am not sure if this will fully fulfill your requirement for incremental graph updates.

On 14. Jul 2018, at 15:59, kant kodali <[hidden email]> wrote:

"You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware"

I want to update the graph incrementally and want to run some graph queries similar to Cypher like give me all the vertices that are connected by a specific set of edges and so on. Don't really intend to run graph algorithms like ConnectedComponents or anything else at this point but of course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke <[hidden email]> wrote:
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware

> On 29. Apr 2018, at 11:43, kant kodali <[hidden email]> wrote:
>
> Do GraphFrames support streaming?

Reply | Threaded
Open this post in threaded view
|

Re: Do GraphFrames support streaming?

kant kodali
The question now would be can it be done in streaming fashion? Are you talking about the union of two streaming dataframes and then constructing a graphframe (also during streaming) ?

On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke <[hidden email]> wrote:
For your use case one might indeed be able to work simply with incremental graph updates. However they are not straight forward in Spark. You can union the new Data with the existing dataframes that represent your graph and create from that a new graph frame.

However I am not sure if this will fully fulfill your requirement for incremental graph updates.

On 14. Jul 2018, at 15:59, kant kodali <[hidden email]> wrote:

"You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware"

I want to update the graph incrementally and want to run some graph queries similar to Cypher like give me all the vertices that are connected by a specific set of edges and so on. Don't really intend to run graph algorithms like ConnectedComponents or anything else at this point but of course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke <[hidden email]> wrote:
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware

> On 29. Apr 2018, at 11:43, kant kodali <[hidden email]> wrote:
>
> Do GraphFrames support streaming?


Reply | Threaded
Open this post in threaded view
|

Re: Do GraphFrames support streaming?

Jörn Franke
No, streaming dataframe needs to be written to disk or similar (or an in-memory backend) then when the next stream arrive join them - create graph and store the next stream together with the existing stream on disk etc.

On 14. Jul 2018, at 17:19, kant kodali <[hidden email]> wrote:

The question now would be can it be done in streaming fashion? Are you talking about the union of two streaming dataframes and then constructing a graphframe (also during streaming) ?

On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke <[hidden email]> wrote:
For your use case one might indeed be able to work simply with incremental graph updates. However they are not straight forward in Spark. You can union the new Data with the existing dataframes that represent your graph and create from that a new graph frame.

However I am not sure if this will fully fulfill your requirement for incremental graph updates.

On 14. Jul 2018, at 15:59, kant kodali <[hidden email]> wrote:

"You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware"

I want to update the graph incrementally and want to run some graph queries similar to Cypher like give me all the vertices that are connected by a specific set of edges and so on. Don't really intend to run graph algorithms like ConnectedComponents or anything else at this point but of course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke <[hidden email]> wrote:
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware

> On 29. Apr 2018, at 11:43, kant kodali <[hidden email]> wrote:
>
> Do GraphFrames support streaming?


Reply | Threaded
Open this post in threaded view
|

Re: Do GraphFrames support streaming?

kant kodali
I have tried this sort of approach in other streaming cases I ran into and I believe the problem with this approach is 

1) we got one stream (say stream1) going to disk say HDFS or a Database and we got another Stream (say stream2) where for every row in stream2 we make an I/O call to see if we can join with a row or rows in stream1 but this would be too many I/O calls if we were trying to make an I/O call for every row.
2) we could say we can make an I/O call per RDD partition in stream2 then there is a possibility that we run into Full Table Scan issues as data from stream1 gets big. 

so I wonder if anyone was able to implement this approach in production successfully(by which I mean making sure it is not resource intensive)? 

Thanks!

On Sat, Jul 14, 2018 at 9:18 AM, Jörn Franke <[hidden email]> wrote:
No, streaming dataframe needs to be written to disk or similar (or an in-memory backend) then when the next stream arrive join them - create graph and store the next stream together with the existing stream on disk etc.

On 14. Jul 2018, at 17:19, kant kodali <[hidden email]> wrote:

The question now would be can it be done in streaming fashion? Are you talking about the union of two streaming dataframes and then constructing a graphframe (also during streaming) ?

On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke <[hidden email]> wrote:
For your use case one might indeed be able to work simply with incremental graph updates. However they are not straight forward in Spark. You can union the new Data with the existing dataframes that represent your graph and create from that a new graph frame.

However I am not sure if this will fully fulfill your requirement for incremental graph updates.

On 14. Jul 2018, at 15:59, kant kodali <[hidden email]> wrote:

"You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware"

I want to update the graph incrementally and want to run some graph queries similar to Cypher like give me all the vertices that are connected by a specific set of edges and so on. Don't really intend to run graph algorithms like ConnectedComponents or anything else at this point but of course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke <[hidden email]> wrote:
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware

> On 29. Apr 2018, at 11:43, kant kodali <[hidden email]> wrote:
>
> Do GraphFrames support streaming?