Spark yarn cluster

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark yarn cluster

Diwakar Dhanuskodi
Hi ,

Could it be possible to setup Spark within Yarn cluster which may not have Hadoop?. 

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Spark yarn cluster

Juan Martín Guillén
Hi Diwakar,

A Yarn cluster not having Hadoop is kind of a fuzzy concept.

Definitely you may want to have Hadoop and don't need to use MapReduce and use Spark instead. That is the main reason to use Spark in a Hadoop cluster anyway.

On the other hand it is highly probable you may want to use HDFS although not strictly necessary.

So answering your question you are using Hadoop by using Yarn because it is one of the 3 main components of it but that doesn't mean you need to use other components of the Hadoop cluster, namely MapReduce and HDFS.

That being said, if you just need cluster scheduling and not using MapReduce nor HDFS it is possible you will be fine with the Spark Standalone cluster.

Regards,
Juan Martín.

El sábado, 11 de julio de 2020 13:57:40 ART, Diwakar Dhanuskodi <[hidden email]> escribió:


Hi ,

Could it be possible to setup Spark within Yarn cluster which may not have Hadoop?. 

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Spark yarn cluster

Diwakar Dhanuskodi
Thanks Martin.

I was not clear in my question initially 😀. Thanks for  understanding and briefing.  

The idea as you said is  to explore the possibility of using yarn for cluster scheduling with spark being used without hdfs.  Thanks again  for clarification.  

On Sat, Jul 11, 2020 at 1:27 PM Juan Martín Guillén <[hidden email]> wrote:
Hi Diwakar,

A Yarn cluster not having Hadoop is kind of a fuzzy concept.

Definitely you may want to have Hadoop and don't need to use MapReduce and use Spark instead. That is the main reason to use Spark in a Hadoop cluster anyway.

On the other hand it is highly probable you may want to use HDFS although not strictly necessary.

So answering your question you are using Hadoop by using Yarn because it is one of the 3 main components of it but that doesn't mean you need to use other components of the Hadoop cluster, namely MapReduce and HDFS.

That being said, if you just need cluster scheduling and not using MapReduce nor HDFS it is possible you will be fine with the Spark Standalone cluster.

Regards,
Juan Martín.

El sábado, 11 de julio de 2020 13:57:40 ART, Diwakar Dhanuskodi <[hidden email]> escribió:


Hi ,

Could it be possible to setup Spark within Yarn cluster which may not have Hadoop?. 

Thanks.