Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Serega Sheypak
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Felix Cheung
Not as far as I recall...



From: Serega Sheypak <[hidden email]>
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?
 
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Li Gao-2
on yarn it is impossible afaik. on kubernetes you can use taints to keep certain nodes outside of spark

On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung <[hidden email]> wrote:
Not as far as I recall...



From: Serega Sheypak <[hidden email]>
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?
 
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Felix Cheung
To clarify, yarn actually supports excluding node right when requesting resources. It’s spark that doesn’t provide a way to populate such a blacklist.

If you can change yarn config, the equivalent is node label: https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeLabel.html


 

From: Li Gao <[hidden email]>
Sent: Saturday, January 19, 2019 8:43 AM
To: Felix Cheung
Cc: Serega Sheypak; user
Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?
 
on yarn it is impossible afaik. on kubernetes you can use taints to keep certain nodes outside of spark

On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung <[hidden email]> wrote:
Not as far as I recall...



From: Serega Sheypak <[hidden email]>
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?
 
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Serega Sheypak
Thanks, so I'll check YARN.
Does anyone know if Spark-on-Yarn plans to expose such functionality? 

сб, 19 янв. 2019 г. в 18:04, Felix Cheung <[hidden email]>:
To clarify, yarn actually supports excluding node right when requesting resources. It’s spark that doesn’t provide a way to populate such a blacklist.

If you can change yarn config, the equivalent is node label: https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeLabel.html


 

From: Li Gao <[hidden email]>
Sent: Saturday, January 19, 2019 8:43 AM
To: Felix Cheung
Cc: Serega Sheypak; user
Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?
 
on yarn it is impossible afaik. on kubernetes you can use taints to keep certain nodes outside of spark

On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung <[hidden email]> wrote:
Not as far as I recall...



From: Serega Sheypak <[hidden email]>
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?
 
Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

attilapiros
Hello, I was working on this area last year (I have developed the
YarnAllocatorBlacklistTracker) and if you haven't found any solution for
your problem I can introduce a new config which would contain a sequence of
always blacklisted nodes. This way blacklisting would improve a bit again :)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Serega Sheypak
Hi Apiros, thanks for your reply.

Is it this one: https://github.com/apache/spark/pull/23223 ?
Can I try to reach you through Cloudera Support portal? 

пн, 21 янв. 2019 г. в 20:06, attilapiros <[hidden email]>:
Hello, I was working on this area last year (I have developed the
YarnAllocatorBlacklistTracker) and if you haven't found any solution for
your problem I can introduce a new config which would contain a sequence of
always blacklisted nodes. This way blacklisting would improve a bit again :)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Attila Zsolt Piros
Hi, 


No. My old development was https://github.com/apache/spark/pull/21068, which is closed.

This would be a new improvement with a new Apache JIRA issue (https://issues.apache.org) and with a new Github pull request.

>> Can I try to reach you through Cloudera Support portal?

It is not needed. This would be an improvement into the Apache Spark which details can be discussed in the JIRA / Github PR.

Attila


On Mon, Jan 21, 2019 at 10:18 PM Serega Sheypak <[hidden email]> wrote:
Hi Apiros, thanks for your reply.

Is it this one: https://github.com/apache/spark/pull/23223 ?
Can I try to reach you through Cloudera Support portal? 

пн, 21 янв. 2019 г. в 20:06, attilapiros <[hidden email]>:
Hello, I was working on this area last year (I have developed the
YarnAllocatorBlacklistTracker) and if you haven't found any solution for
your problem I can introduce a new config which would contain a sequence of
always blacklisted nodes. This way blacklisting would improve a bit again :)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Attila Zsolt Piros

On Tue, Jan 22, 2019 at 11:30 AM Attila Zsolt Piros <[hidden email]> wrote:
Hi, 


No. My old development was https://github.com/apache/spark/pull/21068, which is closed.

This would be a new improvement with a new Apache JIRA issue (https://issues.apache.org) and with a new Github pull request.

>> Can I try to reach you through Cloudera Support portal?

It is not needed. This would be an improvement into the Apache Spark which details can be discussed in the JIRA / Github PR.

Attila


On Mon, Jan 21, 2019 at 10:18 PM Serega Sheypak <[hidden email]> wrote:
Hi Apiros, thanks for your reply.

Is it this one: https://github.com/apache/spark/pull/23223 ?
Can I try to reach you through Cloudera Support portal? 

пн, 21 янв. 2019 г. в 20:06, attilapiros <[hidden email]>:
Hello, I was working on this area last year (I have developed the
YarnAllocatorBlacklistTracker) and if you haven't found any solution for
your problem I can introduce a new config which would contain a sequence of
always blacklisted nodes. This way blacklisting would improve a bit again :)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Jörn Franke
In reply to this post by Serega Sheypak
You can try with Yarn node labels:

Then you can whitelist nodes.

Am 19.01.2019 um 00:20 schrieb Serega Sheypak <[hidden email]>:

Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Imran Rashid-3
Serga, can you explain a bit more why you want this ability?
If the node is really bad, wouldn't you want to decomission the NM entirely?
If you've got heterogenous resources, than nodelabels seem like they would be more appropriate -- and I don't feel great about adding workarounds for the node-label limitations into blacklisting.

I don't want to be stuck supporting a configuration with too limited a use case.

(may be better to move discussion to https://issues.apache.org/jira/browse/SPARK-26688 so its better archived, I'm responding here in case you aren't watching that issue)

On Tue, Jan 22, 2019 at 6:09 AM Jörn Franke <[hidden email]> wrote:
You can try with Yarn node labels:

Then you can whitelist nodes.

Am 19.01.2019 um 00:20 schrieb Serega Sheypak <[hidden email]>:

Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

Serega Sheypak
Hi Imran,
here is my usecase
There is 1K nodes cluster and jobs have performance degradation because of a single node. It's rather hard to convince Cluster Ops to decommission node because of "performance degradation". Imagine 10 dev teams chase single ops team for valid reason (node has problems) or because code has a bug or data is skewed or spots on the sun. We can't just decommission node because random dev complains. 

Simple solution:
- rerun failed / delayed job and blacklist "problematic" node in advance.
- Report about the problem if job works w/o anomalies. 
- ops collect complains about node and start to decommission it when "complains threshold" is reached. It's a rather low probability that many loosely coupled teams with loosely coupled jobs complain about a single node. 


Results
- Ops are not spammed with a random requests from devs
- devs are not blocked because of the really bad node.
- it's very cheap for everyone to "blacklist" node during job submission w/o doing anything to node. 
- it's very easy to automate such behavior. Many teams use 100500 kinds of workflow runners and the strategy is dead simple (depends on SLA of course). 
  - Just re-run failed job excluding nodes with failed tasks (if number of nodes is reasonable)
  - Kill stuck job if it runs longer than XXX minutes and re-start excluding nodes with long-running tasks.



ср, 23 янв. 2019 г. в 23:09, Imran Rashid <[hidden email]>:
Serga, can you explain a bit more why you want this ability?
If the node is really bad, wouldn't you want to decomission the NM entirely?
If you've got heterogenous resources, than nodelabels seem like they would be more appropriate -- and I don't feel great about adding workarounds for the node-label limitations into blacklisting.

I don't want to be stuck supporting a configuration with too limited a use case.

(may be better to move discussion to https://issues.apache.org/jira/browse/SPARK-26688 so its better archived, I'm responding here in case you aren't watching that issue)

On Tue, Jan 22, 2019 at 6:09 AM Jörn Franke <[hidden email]> wrote:
You can try with Yarn node labels:

Then you can whitelist nodes.

Am 19.01.2019 um 00:20 schrieb Serega Sheypak <[hidden email]>:

Hi, is there any possibility to tell Scheduler to blacklist specific nodes in advance?