Can spark shuffle leverage Alluxio to abtain higher stability?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Can spark shuffle leverage Alluxio to abtain higher stability?

Bang Xiao
In my use case, i run spark on yarn-client mode with dynamicAllocation
enabled,  When a node shutting down abnormally, my spark application will
fails because of task fail to fetch shuffle blocks from that node 4 times.
Why spark do not leverage Alluxio(distributed in-memory filesystem) to write
shuffle blocks with replicas ?  In this situation,when a node shutdown,task
can fetch shuffle blocks from other replicas. we can abtain higher stability



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can spark shuffle leverage Alluxio to abtain higher stability?

vincent gromakowski
In your case you need to externalize the shuffle files to a component outside of your spark cluster to make them persist after spark workers death. https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service


2017-12-20 10:46 GMT+01:00 chopinxb <[hidden email]>:
In my use case, i run spark on yarn-client mode with dynamicAllocation
enabled,  When a node shutting down abnormally, my spark application will
fails because of task fail to fetch shuffle blocks from that node 4 times.
Why spark do not leverage Alluxio(distributed in-memory filesystem) to write
shuffle blocks with replicas ?  In this situation,when a node shutdown,task
can fetch shuffle blocks from other replicas. we can abtain higher stability



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Can spark shuffle leverage Alluxio to abtain higher stability?

Bang Xiao
Yes,shuffle service was already started in each NodeManager. What i mean
about node fails is the machine is down,all the service include nodemanager
process in this machine  is down. So in this case, shuffle service is no
longer helpfull



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can spark shuffle leverage Alluxio to abtain higher stability?

vincent gromakowski
Probability of a complete node failure is low. I would rely on data lineage and accept the reprocessing overhead. Another option would be to Write on distributed FS but it will drastically reduce all your jobs speed

Le 20 déc. 2017 11:23, "chopinxb" <[hidden email]> a écrit :
Yes,shuffle service was already started in each NodeManager. What i mean
about node fails is the machine is down,all the service include nodemanager
process in this machine  is down. So in this case, shuffle service is no
longer helpfull



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can spark shuffle leverage Alluxio to abtain higher stability?

Bang Xiao
In my practice of spark application(almost Spark-SQL) , when there is a
complete node failure in my cluster, jobs which have shuffle blocks on the
node will completely fail after 4 task retries.  It seems that data lineage
didn't work. What' more, our applications use multiple SQL statements for
data analysis. After a lengthy calculation, entire application failed
because of one job failure is unacceptable.  So we consider more stability
rather than speed in some way.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can spark shuffle leverage Alluxio to abtain higher stability?

geoHeil
Die you try to use the yarn Shuffle Service?
chopinxb <[hidden email]> schrieb am Do. 21. Dez. 2017 um 04:43:
In my practice of spark application(almost Spark-SQL) , when there is a
complete node failure in my cluster, jobs which have shuffle blocks on the
node will completely fail after 4 task retries.  It seems that data lineage
didn't work. What' more, our applications use multiple SQL statements for
data analysis. After a lengthy calculation, entire application failed
because of one job failure is unacceptable.  So we consider more stability
rather than speed in some way.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Can spark shuffle leverage Alluxio to abtain higher stability?

vincent gromakowski
If not resilient at spark level, can't you just relaunch you job with your orchestration tool ?

Le 21 déc. 2017 09:34, "Georg Heiler" <[hidden email]> a écrit :
Die you try to use the yarn Shuffle Service?
chopinxb <[hidden email]> schrieb am Do. 21. Dez. 2017 um 04:43:
In my practice of spark application(almost Spark-SQL) , when there is a
complete node failure in my cluster, jobs which have shuffle blocks on the
node will completely fail after 4 task retries.  It seems that data lineage
didn't work. What' more, our applications use multiple SQL statements for
data analysis. After a lengthy calculation, entire application failed
because of one job failure is unacceptable.  So we consider more stability
rather than speed in some way.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]