is Union or Join Supported for Spark Structured Streaming Queries in 2.2.0?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

is Union or Join Supported for Spark Structured Streaming Queries in 2.2.0?

kant kodali
Hi All,

I have messages in a queue that might be coming in with few different schemas like 
msg 1 schema 1, msg2 schema2, msg3 schema3, msg 4 schema1....

I want to put all of this in one data frame. is it possible with structured streaming?

I am using Spark 2.2.0

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: is Union or Join Supported for Spark Structured Streaming Queries in 2.2.0?

Jacek Laskowski
Hi,

join between streaming and batch/static Datasets is supported for sure --> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#join-operations

I'm not sure about union, but that's just easy to check (and am leaving it as your home exercise).

You cannot have datasets of different schema in a query. You'd have to use the most wide schema to cover all schemas.

p.s. Have you tried anything...spark-shell's your friend, my friend :)

Pozdrawiam,
Jacek Laskowski
----
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark

On Wed, Dec 13, 2017 at 11:16 PM, kant kodali <[hidden email]> wrote:
Hi All,

I have messages in a queue that might be coming in with few different schemas like 
msg 1 schema 1, msg2 schema2, msg3 schema3, msg 4 schema1....

I want to put all of this in one data frame. is it possible with structured streaming?

I am using Spark 2.2.0

Thanks!