Continuous Processing mode behaves differently from Batch mode

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Continuous Processing mode behaves differently from Batch mode

Yuta Morisawa
Hi all

Now I am using Structured Streaming in Continuous Processing mode and I
faced a odd problem.

My code is so simple that it is similar to the sample code on the
documentation.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing


When I send the same text data ten times, for example 10 lines text, in
Batch mode the result has 100 lines.

But in Continuous Processing mode the result has only 10 lines.
It appears duplicated lines are removed.

The difference of these two codes is only with or without trigger method.

Why these two code behave differently ?


--
Regard,
Yuta


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Continuous Processing mode behaves differently from Batch mode

Shixiong(Ryan) Zhu
One possible case is you don't have enough resources to launch all tasks for your continuous processing query. Could you check the Spark UI and see if all tasks are running rather than waiting for resources?

Best Regards,

Shixiong Zhu
Databricks Inc.


On Tue, May 15, 2018 at 5:38 PM, Yuta Morisawa <[hidden email]> wrote:
Hi all

Now I am using Structured Streaming in Continuous Processing mode and I faced a odd problem.

My code is so simple that it is similar to the sample code on the documentation.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing


When I send the same text data ten times, for example 10 lines text, in Batch mode the result has 100 lines.

But in Continuous Processing mode the result has only 10 lines.
It appears duplicated lines are removed.

The difference of these two codes is only with or without trigger method.

Why these two code behave differently ?


--
Regard,
Yuta


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Continuous Processing mode behaves differently from Batch mode

Yuta Morisawa
Thank you for reply.

I checked WEB UI and found that the total number of tasks is 10.
So, I changed the number of cores from 1 to 10, then it works well.

But I haven't figure out what is happening.

My assumption is that each Job consists of 10 tasks in default and each
task occupies 1 core.
So, in my case, assigning only 1 core cause the issue.
In other words, Continuous mode needs at least 10 cores.

Is it right?


Regards;
Yuta

On 2018/05/16 15:24, Shixiong(Ryan) Zhu wrote:

> One possible case is you don't have enough resources to launch all tasks
> for your continuous processing query. Could you check the Spark UI and
> see if all tasks are running rather than waiting for resources?
>
> Best Regards,
>
> Shixiong Zhu
> Databricks Inc.
> [hidden email] <mailto:[hidden email]>
>
> databricks.com <http://databricks.com/>
>
> http://databricks.com <http://databricks.com/>
>
>
> <https://databricks.com/sparkaisummit/north-america>
>
>
> On Tue, May 15, 2018 at 5:38 PM, Yuta Morisawa
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi all
>
>     Now I am using Structured Streaming in Continuous Processing mode
>     and I faced a odd problem.
>
>     My code is so simple that it is similar to the sample code on the
>     documentation.
>     https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing
>     <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing>
>
>
>     When I send the same text data ten times, for example 10 lines text,
>     in Batch mode the result has 100 lines.
>
>     But in Continuous Processing mode the result has only 10 lines.
>     It appears duplicated lines are removed.
>
>     The difference of these two codes is only with or without trigger
>     method.
>
>     Why these two code behave differently ?
>
>
>     --
>     Regard,
>     Yuta
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: [hidden email]
>     <mailto:[hidden email]>
>
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]