We've been considering using the download package Spark 2.4.4 that's
pre-built for Hadoop 2.7 with Hadoop 2.7.7.
When used with Spark, Hadoop 2.7 is often quoted as the most stable.
However, Hadoop 2.7.7 is End Of Life, so it's not supported and no longer
available as a download. The most recent Hadoop vulnerabilities have only
been fixed in versions 2.8.5 and above. Currently only Hadoop 2.9.2 and
above is available to download.
We've searched the Spark user forum and have also been following discussions
on the development forum and it's still unclear as to the best approach to
this issue. Discussions about Spark 3.0.0 currently want to leave Hadoop 2.7
as the default, when there are known vulnerabilities this is a concern.
What's our best way forward with this? What versions of Hadoop 2.X do you
Should we be switching to use the package Spark 2.4.4 with user-provided
Apache Hadoop? If so, which supported version of Hadoop when used with Spark
should we be using?