Using hadoop-cloud_2.12 jars

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using hadoop-cloud_2.12 jars

Rahij Ramsharan
Hello,

I am trying to use the new S3 committers (https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast) in spark 3.0.0. As per https://spark.apache.org/docs/latest/cloud-integration.html#installation, I need to include "org.apache.spark:hadoop-cloud_2.12:3.0.0" in my classpath. However, I am not able to locate where it is published - https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud is a 404 and https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud has only jars from CDH/Cloudera etc (and none for spark 3.0.0).

Is this intentional or is there some bug in the spark publishing code?

Thanks
Rahij
Reply | Threaded
Open this post in threaded view
|

Re: Using hadoop-cloud_2.12 jars

Jorge Machado-2
You can build it from source. 

Clone the spark git repo and run: ./build/mvn clean package -DskipTests -Phadoop-3.2 -Pkubernetes -Phadoop-cloud

Regards


On 22. Jun 2020, at 11:00, Rahij Ramsharan <[hidden email]> wrote:

Hello,

I am trying to use the new S3 committers (https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast) in spark 3.0.0. As per https://spark.apache.org/docs/latest/cloud-integration.html#installation, I need to include "org.apache.spark:hadoop-cloud_2.12:3.0.0" in my classpath. However, I am not able to locate where it is published - https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud is a 404 and https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud has only jars from CDH/Cloudera etc (and none for spark 3.0.0).

Is this intentional or is there some bug in the spark publishing code?

Thanks
Rahij

Reply | Threaded
Open this post in threaded view
|

Re: Using hadoop-cloud_2.12 jars

Rahij Ramsharan
Thanks for the response. If we intend consumers to be able to use this based on the docs I linked, could we publish the jar to maven central?

On Mon, Jun 22, 2020 at 12:59 PM Jorge Machado <[hidden email]> wrote:
You can build it from source. 

Clone the spark git repo and run: ./build/mvn clean package -DskipTests -Phadoop-3.2 -Pkubernetes -Phadoop-cloud

Regards


On 22. Jun 2020, at 11:00, Rahij Ramsharan <[hidden email]> wrote:

Hello,

I am trying to use the new S3 committers (https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast) in spark 3.0.0. As per https://spark.apache.org/docs/latest/cloud-integration.html#installation, I need to include "org.apache.spark:hadoop-cloud_2.12:3.0.0" in my classpath. However, I am not able to locate where it is published - https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud is a 404 and https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud has only jars from CDH/Cloudera etc (and none for spark 3.0.0).

Is this intentional or is there some bug in the spark publishing code?

Thanks
Rahij