[Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Patrick Brown
I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:

spark.ui.retainedTasks=1
spark.ui.retainedStages=1
spark.ui.retainedJobs=1

However in 2.3.1 the UI doesn't seem to respect this, it still retains a huge number of jobs:

Screen Shot 2018-10-16 at 10.31.50 AM.png


Is this a known issue? Any ideas?
Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Shing Hing Man-2
I have the same problem when I upgrade my application from Spark 2.2.1 to Spark 2.3.2 and run in Yarn client mode.
Also I noticed that in my Spark driver,  org.apache.spark.status.TaskDataWrapper
could take up more than 2G of memory.


Shing


On Tuesday, 16 October 2018, 17:34:02 GMT+1, Patrick Brown <[hidden email]> wrote:


I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:

spark.ui.retainedTasks=1
spark.ui.retainedStages=1
spark.ui.retainedJobs=1

However in 2.3.1 the UI doesn't seem to respect this, it still retains a huge number of jobs:

Screen Shot 2018-10-16 at 10.31.50 AM.png


Is this a known issue? Any ideas?


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Marcelo Vanzin-2
In reply to this post by Patrick Brown
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
<[hidden email]> wrote:
> I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
>
> spark.ui.retainedTasks=1
> spark.ui.retainedStages=1
> spark.ui.retainedJobs=1

I tried that locally on the current master and it seems to be working.
I don't have 2.3 easily in front of me right now, but will take a look
Monday.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Marcelo Vanzin-2
Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).

On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <[hidden email]> wrote:

>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> <[hidden email]> wrote:
> > I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
> >
> > spark.ui.retainedTasks=1
> > spark.ui.retainedStages=1
> > spark.ui.retainedJobs=1
>
> I tried that locally on the current master and it seems to be working.
> I don't have 2.3 easily in front of me right now, but will take a look
> Monday.
>
> --
> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Patrick Brown
I believe I may be able to reproduce this now, it seems like it may be something to do with many jobs at once:

Spark 2.3.1

spark-shell --conf spark.ui.retainedJobs=1

scala> import scala.concurrent._
scala> import scala.concurrent.ExecutionContext.Implicits.global
scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 until i).collect.length) } }

On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <[hidden email]> wrote:
Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).

On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <[hidden email]> wrote:
>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> <[hidden email]> wrote:
> > I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
> >
> > spark.ui.retainedTasks=1
> > spark.ui.retainedStages=1
> > spark.ui.retainedJobs=1
>
> I tried that locally on the current master and it seems to be working.
> I don't have 2.3 easily in front of me right now, but will take a look
> Monday.
>
> --
> Marcelo



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Marcelo Vanzin-2
When you say many jobs at once, what ballpark are you talking about?

The code in 2.3+ does try to keep data about all running jobs and
stages regardless of the limit. If you're running into issues because
of that we may have to look again at whether that's the right thing to
do.
On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
<[hidden email]> wrote:

>
> I believe I may be able to reproduce this now, it seems like it may be something to do with many jobs at once:
>
> Spark 2.3.1
>
> > spark-shell --conf spark.ui.retainedJobs=1
>
> scala> import scala.concurrent._
> scala> import scala.concurrent.ExecutionContext.Implicits.global
> scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 until i).collect.length) } }
>
> On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <[hidden email]> wrote:
>>
>> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> single stage (+ the tasks related to that single stage), same thing in
>> memory (checked with jvisualvm).
>>
>> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <[hidden email]> wrote:
>> >
>> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> > <[hidden email]> wrote:
>> > > I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
>> > >
>> > > spark.ui.retainedTasks=1
>> > > spark.ui.retainedStages=1
>> > > spark.ui.retainedJobs=1
>> >
>> > I tried that locally on the current master and it seems to be working.
>> > I don't have 2.3 easily in front of me right now, but will take a look
>> > Monday.
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Patrick Brown
On my production application I am running ~200 jobs at once, but continue to submit jobs in this manner for sometimes ~1 hour.

The reproduction code above generally only has 4 ish jobs running at once, and as you can see runs through 50k jobs in this manner.

I guess I should clarify my above statement, the issue seems to appear when running multiple jobs at once as well as in sequence for a while and may as well have something to do with high master CPU usage (thus the collect in the code). My rough guess would be whatever is managing clearing out completed jobs gets overwhelmed (my master was a 4 core machine while running this, and htop reported almost full CPU usage across all 4 cores).

The attached screenshot shows the state of the webui after running the repro code, you can see the ui is displaying some 43k completed jobs (takes a long time to load) after a few minutes of inactivity this will clear out, however as my production application continues to submit jobs every once in a while, the issue persists.

On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin <[hidden email]> wrote:
When you say many jobs at once, what ballpark are you talking about?

The code in 2.3+ does try to keep data about all running jobs and
stages regardless of the limit. If you're running into issues because
of that we may have to look again at whether that's the right thing to
do.
On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
<[hidden email]> wrote:
>
> I believe I may be able to reproduce this now, it seems like it may be something to do with many jobs at once:
>
> Spark 2.3.1
>
> > spark-shell --conf spark.ui.retainedJobs=1
>
> scala> import scala.concurrent._
> scala> import scala.concurrent.ExecutionContext.Implicits.global
> scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 until i).collect.length) } }
>
> On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <[hidden email]> wrote:
>>
>> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> single stage (+ the tasks related to that single stage), same thing in
>> memory (checked with jvisualvm).
>>
>> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <[hidden email]> wrote:
>> >
>> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> > <[hidden email]> wrote:
>> > > I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
>> > >
>> > > spark.ui.retainedTasks=1
>> > > spark.ui.retainedStages=1
>> > > spark.ui.retainedJobs=1
>> >
>> > I tried that locally on the current master and it seems to be working.
>> > I don't have 2.3 easily in front of me right now, but will take a look
>> > Monday.
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Screen Shot 2018-10-23 at 4.40.51 PM.png (757K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Marcelo Vanzin-2
Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?

Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
<[hidden email]> wrote:

>
> On my production application I am running ~200 jobs at once, but continue to submit jobs in this manner for sometimes ~1 hour.
>
> The reproduction code above generally only has 4 ish jobs running at once, and as you can see runs through 50k jobs in this manner.
>
> I guess I should clarify my above statement, the issue seems to appear when running multiple jobs at once as well as in sequence for a while and may as well have something to do with high master CPU usage (thus the collect in the code). My rough guess would be whatever is managing clearing out completed jobs gets overwhelmed (my master was a 4 core machine while running this, and htop reported almost full CPU usage across all 4 cores).
>
> The attached screenshot shows the state of the webui after running the repro code, you can see the ui is displaying some 43k completed jobs (takes a long time to load) after a few minutes of inactivity this will clear out, however as my production application continues to submit jobs every once in a while, the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to keep data about all running jobs and
>> stages regardless of the limit. If you're running into issues because
>> of that we may have to look again at whether that's the right thing to
>> do.
>> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
>> <[hidden email]> wrote:
>> >
>> > I believe I may be able to reproduce this now, it seems like it may be something to do with many jobs at once:
>> >
>> > Spark 2.3.1
>> >
>> > > spark-shell --conf spark.ui.retainedJobs=1
>> >
>> > scala> import scala.concurrent._
>> > scala> import scala.concurrent.ExecutionContext.Implicits.global
>> > scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 until i).collect.length) } }
>> >
>> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <[hidden email]> wrote:
>> >>
>> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> >> single stage (+ the tasks related to that single stage), same thing in
>> >> memory (checked with jvisualvm).
>> >>
>> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <[hidden email]> wrote:
>> >> >
>> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >> > <[hidden email]> wrote:
>> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
>> >> > >
>> >> > > spark.ui.retainedTasks=1
>> >> > > spark.ui.retainedStages=1
>> >> > > spark.ui.retainedJobs=1
>> >> >
>> >> > I tried that locally on the current master and it seems to be working.
>> >> > I don't have 2.3 easily in front of me right now, but will take a look
>> >> > Monday.
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

Patrick Brown

On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin <[hidden email]> wrote:
Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?

Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
<[hidden email]> wrote:
>
> On my production application I am running ~200 jobs at once, but continue to submit jobs in this manner for sometimes ~1 hour.
>
> The reproduction code above generally only has 4 ish jobs running at once, and as you can see runs through 50k jobs in this manner.
>
> I guess I should clarify my above statement, the issue seems to appear when running multiple jobs at once as well as in sequence for a while and may as well have something to do with high master CPU usage (thus the collect in the code). My rough guess would be whatever is managing clearing out completed jobs gets overwhelmed (my master was a 4 core machine while running this, and htop reported almost full CPU usage across all 4 cores).
>
> The attached screenshot shows the state of the webui after running the repro code, you can see the ui is displaying some 43k completed jobs (takes a long time to load) after a few minutes of inactivity this will clear out, however as my production application continues to submit jobs every once in a while, the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to keep data about all running jobs and
>> stages regardless of the limit. If you're running into issues because
>> of that we may have to look again at whether that's the right thing to
>> do.
>> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
>> <[hidden email]> wrote:
>> >
>> > I believe I may be able to reproduce this now, it seems like it may be something to do with many jobs at once:
>> >
>> > Spark 2.3.1
>> >
>> > > spark-shell --conf spark.ui.retainedJobs=1
>> >
>> > scala> import scala.concurrent._
>> > scala> import scala.concurrent.ExecutionContext.Implicits.global
>> > scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 until i).collect.length) } }
>> >
>> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <[hidden email]> wrote:
>> >>
>> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> >> single stage (+ the tasks related to that single stage), same thing in
>> >> memory (checked with jvisualvm).
>> >>
>> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <[hidden email]> wrote:
>> >> >
>> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >> > <[hidden email]> wrote:
>> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed:
>> >> > >
>> >> > > spark.ui.retainedTasks=1
>> >> > > spark.ui.retainedStages=1
>> >> > > spark.ui.retainedJobs=1
>> >> >
>> >> > I tried that locally on the current master and it seems to be working.
>> >> > I don't have 2.3 easily in front of me right now, but will take a look
>> >> > Monday.
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo