SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

Reminia Scarlet
Hi all:
 I use StreamingQueryListener to report batch inputRecordsNum as metrics.
 But the numInputRows is aways 0. And the debug log  in  MicroBatchExecution.scala said:
 2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics as number leaves in trigger logical plan did not match that of the execution plan:
 
And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala when number of leaves size not matches in logical plan and execution plan.
image.png
Attached the output logical plan && physical plan leaves. I think there might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
And counting twice as leaf.If we remove the LogcialRDD, leave size should be the same.

image.png
image.png

Can anyone help? Thx very much.
Reply | Threaded
Open this post in threaded view
|

Re: SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

Jungtaek Lim-2
Which version of Spark you are using?
I guess there was relevant issue SPARK-24050 [1] which was fixed in Spark 2.4.0 so you may want to check the latest version out and try if you use lower version.


On Wed, Oct 23, 2019 at 9:57 PM Reminia Scarlet <[hidden email]> wrote:
Hi all:
 I use StreamingQueryListener to report batch inputRecordsNum as metrics.
 But the numInputRows is aways 0. And the debug log  in  MicroBatchExecution.scala said:
 2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics as number leaves in trigger logical plan did not match that of the execution plan:
 
And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala when number of leaves size not matches in logical plan and execution plan.
image.png
Attached the output logical plan && physical plan leaves. I think there might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
And counting twice as leaf.If we remove the LogcialRDD, leave size should be the same.

image.png
image.png

Can anyone help? Thx very much.
Reply | Threaded
Open this post in threaded view
|

Re: SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

Reminia Scarlet
@Jungtaek  
I'm using  Spark 2.4 (HDI 4.0)  in Azure. 
Maybe there are other corner cases not taking into consideration.
Also I will decompile the spark jar from Azure to check the source code .

On Wed, Oct 23, 2019 at 9:39 PM Jungtaek Lim <[hidden email]> wrote:
Which version of Spark you are using?
I guess there was relevant issue SPARK-24050 [1] which was fixed in Spark 2.4.0 so you may want to check the latest version out and try if you use lower version.


On Wed, Oct 23, 2019 at 9:57 PM Reminia Scarlet <[hidden email]> wrote:
Hi all:
 I use StreamingQueryListener to report batch inputRecordsNum as metrics.
 But the numInputRows is aways 0. And the debug log  in  MicroBatchExecution.scala said:
 2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics as number leaves in trigger logical plan did not match that of the execution plan:
 
And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala when number of leaves size not matches in logical plan and execution plan.
image.png
Attached the output logical plan && physical plan leaves. I think there might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
And counting twice as leaf.If we remove the LogcialRDD, leave size should be the same.

image.png
image.png

Can anyone help? Thx very much.
Reply | Threaded
Open this post in threaded view
|

Re: SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

Jungtaek Lim-2
Sorry I haven't checked the details on SPARK-24050. Looks like it was only resolved with DSv2 sources, and there're some streaming sources still using DSv1.
File stream source is one of the case, so SPARK-24050 may not help here. I guess that was technical reason to only dealt with DSv2, so I'm not sure there's a good way to deal with this.

Hopefully file stream source seems to be migrated to DSv2 in Spark 3.0, so Spark 3.0 would help solving the problem.

On Wed, Oct 23, 2019 at 11:21 PM Reminia Scarlet <[hidden email]> wrote:
@Jungtaek  
I'm using  Spark 2.4 (HDI 4.0)  in Azure. 
Maybe there are other corner cases not taking into consideration.
Also I will decompile the spark jar from Azure to check the source code .

On Wed, Oct 23, 2019 at 9:39 PM Jungtaek Lim <[hidden email]> wrote:
Which version of Spark you are using?
I guess there was relevant issue SPARK-24050 [1] which was fixed in Spark 2.4.0 so you may want to check the latest version out and try if you use lower version.


On Wed, Oct 23, 2019 at 9:57 PM Reminia Scarlet <[hidden email]> wrote:
Hi all:
 I use StreamingQueryListener to report batch inputRecordsNum as metrics.
 But the numInputRows is aways 0. And the debug log  in  MicroBatchExecution.scala said:
 2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics as number leaves in trigger logical plan did not match that of the execution plan:
 
And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala when number of leaves size not matches in logical plan and execution plan.
image.png
Attached the output logical plan && physical plan leaves. I think there might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
And counting twice as leaf.If we remove the LogcialRDD, leave size should be the same.

image.png
image.png

Can anyone help? Thx very much.
Reply | Threaded
Open this post in threaded view
|

Re: SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

Reminia Scarlet
We joined streaming from eventhub and static dataframe  from csv and parquet with simple spark.read.csv/ parquet method.
Are sure this is a bug? I am not that familiar with spark codes.
Also forward to dev email list for help. 


On Thu, Oct 24, 2019 at 6:11 AM Jungtaek Lim <[hidden email]> wrote:
Sorry I haven't checked the details on SPARK-24050. Looks like it was only resolved with DSv2 sources, and there're some streaming sources still using DSv1.
File stream source is one of the case, so SPARK-24050 may not help here. I guess that was technical reason to only dealt with DSv2, so I'm not sure there's a good way to deal with this.

Hopefully file stream source seems to be migrated to DSv2 in Spark 3.0, so Spark 3.0 would help solving the problem.

On Wed, Oct 23, 2019 at 11:21 PM Reminia Scarlet <[hidden email]> wrote:
@Jungtaek  
I'm using  Spark 2.4 (HDI 4.0)  in Azure. 
Maybe there are other corner cases not taking into consideration.
Also I will decompile the spark jar from Azure to check the source code .

On Wed, Oct 23, 2019 at 9:39 PM Jungtaek Lim <[hidden email]> wrote:
Which version of Spark you are using?
I guess there was relevant issue SPARK-24050 [1] which was fixed in Spark 2.4.0 so you may want to check the latest version out and try if you use lower version.


On Wed, Oct 23, 2019 at 9:57 PM Reminia Scarlet <[hidden email]> wrote:
Hi all:
 I use StreamingQueryListener to report batch inputRecordsNum as metrics.
 But the numInputRows is aways 0. And the debug log  in  MicroBatchExecution.scala said:
 2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics as number leaves in trigger logical plan did not match that of the execution plan:
 
And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala when number of leaves size not matches in logical plan and execution plan.
image.png
Attached the output logical plan && physical plan leaves. I think there might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
And counting twice as leaf.If we remove the LogcialRDD, leave size should be the same.

image.png
image.png

Can anyone help? Thx very much.
Reply | Threaded
Open this post in threaded view
|

Re: SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.

Jungtaek Lim-2
What you've seen is the code path which there's at least one DSv1 source is used in the query, and fails to match due to the limitation.

SPARK-24050 describes the "technical limitation" of resolving this if DSv1 source is used, so please refer the description of issue if you're interested.


On Thu, Oct 24, 2019 at 3:14 PM Reminia Scarlet <[hidden email]> wrote:
We joined streaming from eventhub and static dataframe  from csv and parquet with simple spark.read.csv/ parquet method.
Are sure this is a bug? I am not that familiar with spark codes.
Also forward to dev email list for help. 


On Thu, Oct 24, 2019 at 6:11 AM Jungtaek Lim <[hidden email]> wrote:
Sorry I haven't checked the details on SPARK-24050. Looks like it was only resolved with DSv2 sources, and there're some streaming sources still using DSv1.
File stream source is one of the case, so SPARK-24050 may not help here. I guess that was technical reason to only dealt with DSv2, so I'm not sure there's a good way to deal with this.

Hopefully file stream source seems to be migrated to DSv2 in Spark 3.0, so Spark 3.0 would help solving the problem.

On Wed, Oct 23, 2019 at 11:21 PM Reminia Scarlet <[hidden email]> wrote:
@Jungtaek  
I'm using  Spark 2.4 (HDI 4.0)  in Azure. 
Maybe there are other corner cases not taking into consideration.
Also I will decompile the spark jar from Azure to check the source code .

On Wed, Oct 23, 2019 at 9:39 PM Jungtaek Lim <[hidden email]> wrote:
Which version of Spark you are using?
I guess there was relevant issue SPARK-24050 [1] which was fixed in Spark 2.4.0 so you may want to check the latest version out and try if you use lower version.


On Wed, Oct 23, 2019 at 9:57 PM Reminia Scarlet <[hidden email]> wrote:
Hi all:
 I use StreamingQueryListener to report batch inputRecordsNum as metrics.
 But the numInputRows is aways 0. And the debug log  in  MicroBatchExecution.scala said:
 2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics as number leaves in trigger logical plan did not match that of the execution plan:
 
And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala when number of leaves size not matches in logical plan and execution plan.
image.png
Attached the output logical plan && physical plan leaves. I think there might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
And counting twice as leaf.If we remove the LogcialRDD, leave size should be the same.

image.png
image.png

Can anyone help? Thx very much.