Reg - Why Apache Hadoop need to be Installed separately for Running Apache Spark…?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Reg - Why Apache Hadoop need to be Installed separately for Running Apache Spark…?

Praveen Kumar Ramachandran
I'm learning Apache Spark, where I'm trying to run a basic Spark Program written in Java. I've installed Apache Spark (spark-2.4.3-bin-without-hadoop) downloaded from https://spark.apache.org/.

I've created a maven project in eclipse and added the following dependency :

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.4.3</version>
    </dependency>

After building the project, I've tried to run the program by setting sparkMaster=local through spark config and now I've Encountered with the following Error :

    java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.

After referring to some sites, I've installed hadoop-2.7.7 and added "HADOOP_HOME" to my .bash_profile.

And I'm able to execute my Spark Program!!

Now I need to know where and how Hadoop is necessary for Spark??

I've posted the same in stackoverflow long back, but still can't get a response.

Regards,
Praveen Kumar Ramachandran