Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

Geoff Von Allmen

I am trying to deploy a standalone cluster but running into ClassNotFound errors.

I have tried a whole myriad of different approaches varying from packaging all dependencies into a single JAR and using the --packages and --driver-class-path options.

I’ve got a master node started, a slave node running on the same system, and am using spark submit to get the streaming job kicked off.

Here is the error I’m getting:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
    at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
    at com.Customer.start(Customer.scala:47)
    at com.Main$.main(Main.scala:23)
    at com.Main.main(Main.scala)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 18 more

Here is the spark submit command I’m using:

./spark-submit \
    --master spark://<domain>:<port> \
    --files jaas.conf \
    --deploy-mode cluster \
    --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
    --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
    --driver-class-path ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
    --class <class_main> \
    --verbose \
    my_jar.jar

I’ve tried all sorts of combinations of including different packages and driver-class-path jar files. As far as I can find, the serializer should be in the kafka-clients jar file, which I’ve tried including to no success.

Pom Dependencies are as follows:

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.8-dmr</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>2.9.9</version>
        </dependency>
    </dependencies>

If I remove --deploy-mode and run it as client … it works just fine.

Thanks Everyone -

Geoff V.

Reply | Threaded
Open this post in threaded view
|

Re: Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

Eyal Zituny
Hi,
it seems that you're missing the kafka-clients jar (and probably some other dependencies as well)
how did you packaged you application jar? does it includes all the required dependencies (as an uber jar)?
if it's not an uber jar you need to pass via the driver-class-path and the executor-class-path all the files\dirs where your dependencies can be found (note that those must be accessible from each node in the cluster)
i suggest to go over the manual

Eyal


On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <[hidden email]> wrote:

I am trying to deploy a standalone cluster but running into ClassNotFound errors.

I have tried a whole myriad of different approaches varying from packaging all dependencies into a single JAR and using the --packages and --driver-class-path options.

I’ve got a master node started, a slave node running on the same system, and am using spark submit to get the streaming job kicked off.

Here is the error I’m getting:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
    at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
    at com.Customer.start(Customer.scala:47)
    at com.Main$.main(Main.scala:23)
    at com.Main.main(Main.scala)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 18 more

Here is the spark submit command I’m using:

./spark-submit \
    --master spark://<domain>:<port> \
    --files jaas.conf \
    --deploy-mode cluster \
    --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
    --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
    --driver-class-path ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
    --class <class_main> \
    --verbose \
    my_jar.jar

I’ve tried all sorts of combinations of including different packages and driver-class-path jar files. As far as I can find, the serializer should be in the kafka-clients jar file, which I’ve tried including to no success.

Pom Dependencies are as follows:

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.8-dmr</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>2.9.9</version>
        </dependency>
    </dependencies>

If I remove --deploy-mode and run it as client … it works just fine.

Thanks Everyone -

Geoff V.


Reply | Threaded
Open this post in threaded view
|

Re: Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

Geoff Von Allmen

I’ve tried it both ways.

Uber jar gives me gives me the following:

If I only do minimal packaging and add org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar as a --package and then add it to the --driver-class-path then I get past that error, but I get the error I showed in the original post.

I agree it seems it’s missing the kafka-clients jar file as that is where the ByteArrayDeserializer is, though it looks like it’s present as far as I can tell.

I can see the following two packages in the ClassPath entries on the history server (Though the source shows: ********(redacted) — not sure why?)

  • spark://<ip>:<port>/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
  • spark://<ip>:<port>/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar

As as side note, i’m running both a master and worker on the same system just to test out running in cluster mode. Not sure if that would have anything to do with it. I would think it would make it easier since it's got access to all the same file system... but I'm pretty new to Spark.

I have also read through and followed those instructions as well as many others at this point.

Thanks!


On Wed, Dec 27, 2017 at 12:56 AM, Eyal Zituny <[hidden email]> wrote:
Hi,
it seems that you're missing the kafka-clients jar (and probably some other dependencies as well)
how did you packaged you application jar? does it includes all the required dependencies (as an uber jar)?
if it's not an uber jar you need to pass via the driver-class-path and the executor-class-path all the files\dirs where your dependencies can be found (note that those must be accessible from each node in the cluster)
i suggest to go over the manual

Eyal


On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <[hidden email]> wrote:

I am trying to deploy a standalone cluster but running into ClassNotFound errors.

I have tried a whole myriad of different approaches varying from packaging all dependencies into a single JAR and using the --packages and --driver-class-path options.

I’ve got a master node started, a slave node running on the same system, and am using spark submit to get the streaming job kicked off.

Here is the error I’m getting:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
    at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
    at com.Customer.start(Customer.scala:47)
    at com.Main$.main(Main.scala:23)
    at com.Main.main(Main.scala)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 18 more

Here is the spark submit command I’m using:

./spark-submit \
    --master spark://<domain>:<port> \
    --files jaas.conf \
    --deploy-mode cluster \
    --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
    --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
    --driver-class-path ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
    --class <class_main> \
    --verbose \
    my_jar.jar

I’ve tried all sorts of combinations of including different packages and driver-class-path jar files. As far as I can find, the serializer should be in the kafka-clients jar file, which I’ve tried including to no success.

Pom Dependencies are as follows:

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.8-dmr</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>2.9.9</version>
        </dependency>
    </dependencies>

If I remove --deploy-mode and run it as client … it works just fine.

Thanks Everyone -

Geoff V.



Reply | Threaded
Open this post in threaded view
|

Re: Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

Shixiong(Ryan) Zhu
The cluster mode doesn't upload jars to the driver node. This is a known issue: https://issues.apache.org/jira/browse/SPARK-4160

On Wed, Dec 27, 2017 at 1:27 AM, Geoff Von Allmen <[hidden email]> wrote:

I’ve tried it both ways.

Uber jar gives me gives me the following:

If I only do minimal packaging and add org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar as a --package and then add it to the --driver-class-path then I get past that error, but I get the error I showed in the original post.

I agree it seems it’s missing the kafka-clients jar file as that is where the ByteArrayDeserializer is, though it looks like it’s present as far as I can tell.

I can see the following two packages in the ClassPath entries on the history server (Though the source shows: ********(redacted) — not sure why?)

  • spark://<ip>:<port>/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
  • spark://<ip>:<port>/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar

As as side note, i’m running both a master and worker on the same system just to test out running in cluster mode. Not sure if that would have anything to do with it. I would think it would make it easier since it's got access to all the same file system... but I'm pretty new to Spark.

I have also read through and followed those instructions as well as many others at this point.

Thanks!


On Wed, Dec 27, 2017 at 12:56 AM, Eyal Zituny <[hidden email]> wrote:
Hi,
it seems that you're missing the kafka-clients jar (and probably some other dependencies as well)
how did you packaged you application jar? does it includes all the required dependencies (as an uber jar)?
if it's not an uber jar you need to pass via the driver-class-path and the executor-class-path all the files\dirs where your dependencies can be found (note that those must be accessible from each node in the cluster)
i suggest to go over the manual

Eyal


On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <[hidden email]> wrote:

I am trying to deploy a standalone cluster but running into ClassNotFound errors.

I have tried a whole myriad of different approaches varying from packaging all dependencies into a single JAR and using the --packages and --driver-class-path options.

I’ve got a master node started, a slave node running on the same system, and am using spark submit to get the streaming job kicked off.

Here is the error I’m getting:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
    at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
    at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
    at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
    at com.Customer.start(Customer.scala:47)
    at com.Main$.main(Main.scala:23)
    at com.Main.main(Main.scala)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 18 more

Here is the spark submit command I’m using:

./spark-submit \
    --master spark://<domain>:<port> \
    --files jaas.conf \
    --deploy-mode cluster \
    --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
    --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
    --driver-class-path ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
    --class <class_main> \
    --verbose \
    my_jar.jar

I’ve tried all sorts of combinations of including different packages and driver-class-path jar files. As far as I can find, the serializer should be in the kafka-clients jar file, which I’ve tried including to no success.

Pom Dependencies are as follows:

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.8-dmr</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>2.9.9</version>
        </dependency>
    </dependencies>

If I remove --deploy-mode and run it as client … it works just fine.

Thanks Everyone -

Geoff V.