Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

ThomasThomas
Hi There,

Our use case is like this.

We have a nested(multiple) JSON message flowing through Kafka Queue.  Read
the message from Kafka using Spark Structured Streaming(SSS) and  explode
the data and flatten all data into single record using DataFrame joins and
land into a relational database table(DB2).

But we are getting the following error when we write into db using JDBC.

“org.apache.spark.sql.AnalysisException: Inner join between two streaming
DataFrames/Datasets is not supported;”

Any help would be greatly appreciated.


Thanks,
Thomas Thomas
Mastermind Solutions LLC.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

☼ R Nair (रविशंकर नायर)

On Sat, May 12, 2018, 10:57 AM ThomasThomas <[hidden email]> wrote:
Hi There,

Our use case is like this.

We have a nested(multiple) JSON message flowing through Kafka Queue.  Read
the message from Kafka using Spark Structured Streaming(SSS) and  explode
the data and flatten all data into single record using DataFrame joins and
land into a relational database table(DB2).

But we are getting the following error when we write into db using JDBC.

“org.apache.spark.sql.AnalysisException: Inner join between two streaming
DataFrames/Datasets is not supported;”

Any help would be greatly appreciated.


Thanks,
Thomas Thomas
Mastermind Solutions LLC.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

ThomasThomas
Thanks for the quick response...I'm able to inner join the dataframes with
regular spark session. The issue is only with the spark streaming session.
BTW I'm using Spark 2.2.0 version...



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

Jacek Laskowski
In reply to this post by ThomasThomas
Hi,

The exception message should be self-explanatory and says that you cannot join two streaming Datasets. This feature was added in 2.3 if I'm not mistaken.

Just to be sure that you work with two streaming Datasets, can you show the query plan of the join query?

Jacek

On Sat, 12 May 2018, 16:57 ThomasThomas, <[hidden email]> wrote:
Hi There,

Our use case is like this.

We have a nested(multiple) JSON message flowing through Kafka Queue.  Read
the message from Kafka using Spark Structured Streaming(SSS) and  explode
the data and flatten all data into single record using DataFrame joins and
land into a relational database table(DB2).

But we are getting the following error when we write into db using JDBC.

“org.apache.spark.sql.AnalysisException: Inner join between two streaming
DataFrames/Datasets is not supported;”

Any help would be greatly appreciated.


Thanks,
Thomas Thomas
Mastermind Solutions LLC.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

☼ R Nair (रविशंकर नायर)
Hi Jacek,

If we use RDD instead of Dataframe, can we accomplish the same? I mean, is joining  between RDDS allowed in Spark streaming ?

Best,
Ravi

On Sun, May 13, 2018 at 11:18 AM Jacek Laskowski <[hidden email]> wrote:
Hi,

The exception message should be self-explanatory and says that you cannot join two streaming Datasets. This feature was added in 2.3 if I'm not mistaken.

Just to be sure that you work with two streaming Datasets, can you show the query plan of the join query?

Jacek

On Sat, 12 May 2018, 16:57 ThomasThomas, <[hidden email]> wrote:
Hi There,

Our use case is like this.

We have a nested(multiple) JSON message flowing through Kafka Queue.  Read
the message from Kafka using Spark Structured Streaming(SSS) and  explode
the data and flatten all data into single record using DataFrame joins and
land into a relational database table(DB2).

But we are getting the following error when we write into db using JDBC.

“org.apache.spark.sql.AnalysisException: Inner join between two streaming
DataFrames/Datasets is not supported;”

Any help would be greatly appreciated.


Thanks,
Thomas Thomas
Mastermind Solutions LLC.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

Jacek Laskowski
Hi,

After you leave Spark Structured Streaming right after you generate RDDs (for your streaming queries) you can do any kind of "joins". You're again in the old good days of RDD programming (with all the whistles and bells).

Please note that Spark Structured Streaming != Spark Streaming since the former uses Dataset API while the latter RDD API.

Don't touch RDD API and Spark Streaming unless you know what you're doing :)


On Tue, May 15, 2018 at 5:36 PM, ☼ R Nair (रविशंकर नायर) <[hidden email]> wrote:
Hi Jacek,

If we use RDD instead of Dataframe, can we accomplish the same? I mean, is joining  between RDDS allowed in Spark streaming ?

Best,
Ravi

On Sun, May 13, 2018 at 11:18 AM Jacek Laskowski <[hidden email]> wrote:
Hi,

The exception message should be self-explanatory and says that you cannot join two streaming Datasets. This feature was added in 2.3 if I'm not mistaken.

Just to be sure that you work with two streaming Datasets, can you show the query plan of the join query?

Jacek

On Sat, 12 May 2018, 16:57 ThomasThomas, <[hidden email]> wrote:
Hi There,

Our use case is like this.

We have a nested(multiple) JSON message flowing through Kafka Queue.  Read
the message from Kafka using Spark Structured Streaming(SSS) and  explode
the data and flatten all data into single record using DataFrame joins and
land into a relational database table(DB2).

But we are getting the following error when we write into db using JDBC.

“org.apache.spark.sql.AnalysisException: Inner join between two streaming
DataFrames/Datasets is not supported;”

Any help would be greatly appreciated.


Thanks,
Thomas Thomas
Mastermind Solutions LLC.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]