Spark 2.4.4 with which version of Hadoop?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark 2.4.4 with which version of Hadoop?


We've been considering using the download package Spark 2.4.4 that's
pre-built for Hadoop 2.7 with Hadoop 2.7.7.

When used with Spark, Hadoop 2.7 is often quoted as the most stable.

However, Hadoop 2.7.7 is End Of Life, so it's not supported and no longer
available as a download. The most recent Hadoop vulnerabilities have only
been fixed in versions 2.8.5 and above. Currently only Hadoop 2.9.2 and
above is available to download.

We've searched the Spark user forum and have also been following discussions
on the development forum and it's still unclear as to the best approach to
this issue. Discussions about Spark 3.0.0 currently want to leave Hadoop 2.7
as the default, when there are known vulnerabilities this is a concern.

What's our best way forward with this? What versions of Hadoop 2.X do you

Should we be switching to use the package Spark 2.4.4 with user-provided
Apache Hadoop? If so, which supported version of Hadoop when used with Spark
should we be using?



Sent from:

To unsubscribe e-mail: [hidden email]