Does partition by and order by works only in stateful case?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Does partition by and order by works only in stateful case?

kant kodali
Hi All,

Does partition by and order by works only in stateful case?

For example:

select row_number() over (partition by id order by timestamp) from table

gives me

SEVERE: Exception occured while submitting the query: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;

I wonder what time based window means? is it not the window from over() clause or does it mean group by(window('timestamp'), '10 minutes') like the stateful case?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Does partition by and order by works only in stateful case?

Tathagata Das
The traditional SQL windows with `over` is not supported in streaming. Only time-based windows, that is, `window("timestamp", "10 minutes")` is supported in streaming.

On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <[hidden email]> wrote:
Hi All,

Does partition by and order by works only in stateful case?

For example:

select row_number() over (partition by id order by timestamp) from table

gives me

SEVERE: Exception occured while submitting the query: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;

I wonder what time based window means? is it not the window from over() clause or does it mean group by(window('timestamp'), '10 minutes') like the stateful case?

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Does partition by and order by works only in stateful case?

kant kodali
got it! Thanks.

On Thu, Apr 12, 2018 at 7:53 PM, Tathagata Das <[hidden email]> wrote:
The traditional SQL windows with `over` is not supported in streaming. Only time-based windows, that is, `window("timestamp", "10 minutes")` is supported in streaming.

On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <[hidden email]> wrote:
Hi All,

Does partition by and order by works only in stateful case?

For example:

select row_number() over (partition by id order by timestamp) from table

gives me

SEVERE: Exception occured while submitting the query: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;

I wonder what time based window means? is it not the window from over() clause or does it mean group by(window('timestamp'), '10 minutes') like the stateful case?

Thanks


Reply | Threaded
Open this post in threaded view
|

Re: Does partition by and order by works only in stateful case?

Gourav Sengupta
In reply to this post by Tathagata Das
Hi,

My sincere apologies for adding my question to this chain. For some reason, I am unable to see the messages which I write to the group ever appear back in it and I think that this might be related in a way that shows a few differences between traditional operations and Spark Streaming operations.

Can I please ask why does lines.count() throws the exception: org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;

Whereas if I do lines.createOrReplaceTempView("test") and then run the sql "select count(*) ccount from test" it runs absolutely fine. 

I can figure out from the exceptions that there is a check which is getting executed to find out whether isStreaming is true for lines or not, but a bit of explanation might help.



Regards,
Gourav Sengupta

On Fri, Apr 13, 2018 at 3:53 AM, Tathagata Das <[hidden email]> wrote:
The traditional SQL windows with `over` is not supported in streaming. Only time-based windows, that is, `window("timestamp", "10 minutes")` is supported in streaming.

On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <[hidden email]> wrote:
Hi All,

Does partition by and order by works only in stateful case?

For example:

select row_number() over (partition by id order by timestamp) from table

gives me

SEVERE: Exception occured while submitting the query: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;

I wonder what time based window means? is it not the window from over() clause or does it mean group by(window('timestamp'), '10 minutes') like the stateful case?

Thanks