Plans for Session Windows?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Plans for Session Windows?

msukmanowsky
Hi all,

Just wondering if Spark has any plans to support session-based windows in structured streaming as documented by Apache Beam here or perhaps better in this blog post?

Figure-16---Session-Merging-a526d52186fcc33f3b5b5c59b176ac4e.jpg

--
Mike Sukmanowsky
Aspiring Digital Carpenter



Reply | Threaded
Open this post in threaded view
|

Re: Plans for Session Windows?

Arun Mahadevan

There is no stock API to do this directly, but it can be implemented on top of mapGroupWithState like here - 


https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala

 

It would be worth bundling this into a builtin api, but AFAIK there are no plans yet.

 

Thanks,

Arun



On 9 August 2018 at 08:02, Mike Sukmanowsky <[hidden email]> wrote:
Hi all,

Just wondering if Spark has any plans to support session-based windows in structured streaming as documented by Apache Beam here or perhaps better in this blog post?

Figure-16---Session-Merging-a526d52186fcc33f3b5b5c59b176ac4e.jpg

--
Mike Sukmanowsky
Aspiring Digital Carpenter




Reply | Threaded
Open this post in threaded view
|

Re: Plans for Session Windows?

msukmanowsky
Thanks Arun. It's curious as Apache Beam says it fully supports session windows running on Spark but I'd imagine that, under the hood, it's leveraging mapGroupsWithState.

Are there any plans to support the mapGroupsWithState API in PySpark?

On Thu, 9 Aug 2018 at 13:13, Arun Mahadevan <[hidden email]> wrote:

There is no stock API to do this directly, but it can be implemented on top of mapGroupWithState like here - 


https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala

 

It would be worth bundling this into a builtin api, but AFAIK there are no plans yet.

 

Thanks,

Arun



On 9 August 2018 at 08:02, Mike Sukmanowsky <[hidden email]> wrote:
Hi all,

Just wondering if Spark has any plans to support session-based windows in structured streaming as documented by Apache Beam here or perhaps better in this blog post?

Figure-16---Session-Merging-a526d52186fcc33f3b5b5c59b176ac4e.jpg

--
Mike Sukmanowsky
Aspiring Digital Carpenter






--
Mike Sukmanowsky
Aspiring Digital Carpenter



Reply | Threaded
Open this post in threaded view
|

Re: Plans for Session Windows?

Arun Mahadevan
Not sure if it changed recently, but the beam spark runner still uses the DStreams API. It need not necessarily use the higher level apis (like mapGroupsWithState or equivalent) provided by spark, it could as well be based on groupByKey or reduce primitives and the window functions provided by beam.

On 9 August 2018 at 12:23, Mike Sukmanowsky <[hidden email]> wrote:
Thanks Arun. It's curious as Apache Beam says it fully supports session windows running on Spark but I'd imagine that, under the hood, it's leveraging mapGroupsWithState.

Are there any plans to support the mapGroupsWithState API in PySpark?

On Thu, 9 Aug 2018 at 13:13, Arun Mahadevan <[hidden email]> wrote:

There is no stock API to do this directly, but it can be implemented on top of mapGroupWithState like here - 


https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala

 

It would be worth bundling this into a builtin api, but AFAIK there are no plans yet.

 

Thanks,

Arun



On 9 August 2018 at 08:02, Mike Sukmanowsky <[hidden email]> wrote:
Hi all,

Just wondering if Spark has any plans to support session-based windows in structured streaming as documented by Apache Beam here or perhaps better in this blog post?

Figure-16---Session-Merging-a526d52186fcc33f3b5b5c59b176ac4e.jpg

--
Mike Sukmanowsky
Aspiring Digital Carpenter






--
Mike Sukmanowsky
Aspiring Digital Carpenter