Spark 3.0 with Hadoop 2.6 HDFS/Hive

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 3.0 with Hadoop 2.6 HDFS/Hive

Ashika Umanga
Greetings, 

Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016

We run our Spark cluster on K8s in standalone mode.
We access HDFS/Hive running on a Hadoop 2.6 cluster.
We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
However, we dont have any control over the Hadoop cluster and it will remain in 2.6

Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ? 

Best Regards,
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

DB Tsai-3
In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
embedded Hadoop 3.2, you can set
`spark.yarn.populateHadoopClasspath=false` to not populate the
cluster's hadoop classpath. In this scenario, Spark will use hadoop
3.2 client to connect to hadoop 2.6 which should work fine. In fact,
we have production deployment using this way for a while.

On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga <[hidden email]> wrote:

>
> Greetings,
>
> Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016
>
> We run our Spark cluster on K8s in standalone mode.
> We access HDFS/Hive running on a Hadoop 2.6 cluster.
> We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
> However, we dont have any control over the Hadoop cluster and it will remain in 2.6
>
> Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
>
> Best Regards,



--
Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

Ashika Umanga Umagiliya
In reply to this post by Ashika Umanga
Hello

"spark.yarn.populateHadoopClasspath" is used in YARN mode correct?
However our Spark cluster is standalone cluster not using YARN.
We only connect to HDFS/Hive to access data.Computation is done on our spark cluster running on K8s (not Yarn)


On Mon, Jul 20, 2020 at 2:04 PM DB Tsai <[hidden email]> wrote:
In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
embedded Hadoop 3.2, you can set
`spark.yarn.populateHadoopClasspath=false` to not populate the
cluster's hadoop classpath. In this scenario, Spark will use hadoop
3.2 client to connect to hadoop 2.6 which should work fine. In fact,
we have production deployment using this way for a while.

On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga <[hidden email]> wrote:
>
> Greetings,
>
> Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016
>
> We run our Spark cluster on K8s in standalone mode.
> We access HDFS/Hive running on a Hadoop 2.6 cluster.
> We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
> However, we dont have any control over the Hadoop cluster and it will remain in 2.6
>
> Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
>
> Best Regards,



--
Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1


--
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

Prashant Sharma
In reply to this post by Ashika Umanga
Hi Ashika,

Hadoop 2.6 is now no longer supported, and since it has not been maintained in the last 2 years, it means it may have some security issues unpatched. Spark 3.0 onwards, we no longer support it, in other words, we have modified our codebase in a way that Hadoop 2.6 won't work. However, if you are determined, you can always apply a custom patch to spark codebase and support it. I would recommend moving to newer Hadoop.

Thanks,

On Mon, Jul 20, 2020 at 8:41 AM Ashika Umanga <[hidden email]> wrote:
Greetings, 

Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016

We run our Spark cluster on K8s in standalone mode.
We access HDFS/Hive running on a Hadoop 2.6 cluster.
We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
However, we dont have any control over the Hadoop cluster and it will remain in 2.6

Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ? 

Best Regards,
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

DB Tsai-3
In reply to this post by Ashika Umanga Umagiliya
If it's standalone mode, it's even easier. You should be able to
connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just
don't put hadoop 2.6 into your classpath.

On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya
<[hidden email]> wrote:

>
> Hello
>
> "spark.yarn.populateHadoopClasspath" is used in YARN mode correct?
> However our Spark cluster is standalone cluster not using YARN.
> We only connect to HDFS/Hive to access data.Computation is done on our spark cluster running on K8s (not Yarn)
>
>
> On Mon, Jul 20, 2020 at 2:04 PM DB Tsai <[hidden email]> wrote:
>>
>> In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
>> embedded Hadoop 3.2, you can set
>> `spark.yarn.populateHadoopClasspath=false` to not populate the
>> cluster's hadoop classpath. In this scenario, Spark will use hadoop
>> 3.2 client to connect to hadoop 2.6 which should work fine. In fact,
>> we have production deployment using this way for a while.
>>
>> On Sun, Jul 19, 2020 at 8:10 PM Ashika Umanga <[hidden email]> wrote:
>> >
>> > Greetings,
>> >
>> > Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016
>> >
>> > We run our Spark cluster on K8s in standalone mode.
>> > We access HDFS/Hive running on a Hadoop 2.6 cluster.
>> > We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
>> > However, we dont have any control over the Hadoop cluster and it will remain in 2.6
>> >
>> > Is Spark 3.0 still compatible with HDFS/Hive running on Hadoop 2.6 ?
>> >
>> > Best Regards,
>>
>>
>>
>> --
>> Sincerely,
>>
>> DB Tsai
>> ----------------------------------------------------------
>> Web: https://www.dbtsai.com
>> PGP Key ID: 42E5B25A8F7A82C1
>
>
>
> --
> Umanga
> http://jp.linkedin.com/in/umanga
> http://umanga.ifreepages.com



--
Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]