SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

k.tham
I'm trying to save an RDD as a parquet file through the saveAsParquestFile() api,

With code that looks something like:

val sc = ...
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val someRDD: RDD[SomeCaseClass] = ...
someRDD.saveAsParquetFile("someRDD.parquet")

However, I get the following error:
java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

I'm trying to figure out what the issue is, help is appreciated, thx!

My sbt configuration has the following:

val sparkV = "1.0.0"
// ...
"org.apache.spark"      %% "spark-core"               % sparkV,
"org.apache.spark"      %% "spark-mllib"              % sparkV,
"org.apache.spark"      %% "spark-sql"                % sparkV,

Here's the stack trace:

java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
        at org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:256)
        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
        at org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:224)
        at org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:242)
        at org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:242)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Reply | Threaded
Open this post in threaded view
|

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

Michael Armbrust
This thread seems to be about the same issue:


On Tue, Jun 3, 2014 at 12:25 PM, k.tham <[hidden email]> wrote:
I'm trying to save an RDD as a parquet file through the saveAsParquestFile()
api,

With code that looks something like:

val sc = ...
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val someRDD: RDD[SomeCaseClass] = ...
someRDD.saveAsParquetFile("someRDD.parquet")

However, I get the following error:
java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

I'm trying to figure out what the issue is, help is appreciated, thx!

My sbt configuration has the following:

val sparkV = "1.0.0"
// ...
"org.apache.spark"      %% "spark-core"               % sparkV,
"org.apache.spark"      %% "spark-mllib"              % sparkV,
"org.apache.spark"      %% "spark-sql"                % sparkV,

Here's the stack trace:

java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
        at
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:256)
        at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
        at
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:224)
        at
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:242)
        at
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:242)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

k.tham
Oh, I missed that thread. Thanks
Reply | Threaded
Open this post in threaded view
|

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

k.tham
In reply to this post by Michael Armbrust
I've read through that thread, and it seems for him, he needed to add a particular hadoop-client dependency.
However, I don't think I should be required to do that as I'm not reading from HDFS.

I'm just running a straight up minimal example, in local mode, and out of the box.

Here's an example minimal project that reproduces this error:

https://github.com/ktham/spark-parquet-example
Reply | Threaded
Open this post in threaded view
|

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

sowen
All of that support code uses Hadoop-related classes, like
OutputFormat, to do the writing to Parquet format. There's a Hadoop
code dependency in play here even if the bytes aren't going to HDFS.

On Tue, Jun 3, 2014 at 10:10 PM, k.tham <[hidden email]> wrote:

> I've read through that thread, and it seems for him, he needed to add a
> particular hadoop-client dependency.
> However, I don't think I should be required to do that as I'm not reading
> from HDFS.
>
> I'm just running a straight up minimal example, in local mode, and out of
> the box.
>
> Here's an example minimal project that reproduces this error:
>
> https://github.com/ktham/spark-parquet-example
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837p6846.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

k.tham
I see, thanks
Reply | Threaded
Open this post in threaded view
|

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

lefromage
This post has NOT been accepted by the mailing list yet.
SOLVED:
cat build.sbt
name := "project_name"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.0.0"

libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

then: sbt package
then: $SPARK_HOME/bin/spark-submit --class "ParquetExample" --master local[4] target/scala-2.10/project_name_2.10-1.0.jar