HDP 3.1 spark Kafka dependency

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HDP 3.1 spark Kafka dependency

William R
Hi,

I am finding difficulty in getting the proper Kafka lib's for spark. The version of HDP is 3.1 and i tried the below lib's but it produces the below issues.

POM entry :
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.0.0.3.1.0.0-78</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0.3.1.0.0-78</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2.3.1.0.0-78</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.3.2.3.1.0.0-78</version>
</dependency>
Issues while spark-submit :

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
        at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
        at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

 
Could someone help me if i am doing something wrong ?

Spark Submit:

export KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
export KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
export SPARK_KAFKA_VERSION=NONE

spark-submit --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties" --files "kafka.consumer.properties" --class com.example.ReadDataFromKafka HelloKafka-1.0-SNAPSHOT.jar



Regards,
William R

 
 
Reply | Threaded
Open this post in threaded view
|

Re: HDP 3.1 spark Kafka dependency

Zahid Rahman
I have found many library incompatibility issues including JVM headless issues where I had to uninstall  headless jvm and install jdk
and work through them, anyway
This page shows the same error as yours,
you  may get away  with making the changes to your pom.xml as suggested.

Good Luck !

¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}


On Wed, 18 Mar 2020 at 16:36, William R <[hidden email]> wrote:
Hi,

I am finding difficulty in getting the proper Kafka lib's for spark. The version of HDP is 3.1 and i tried the below lib's but it produces the below issues.

POM entry :
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.0.0.3.1.0.0-78</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0.3.1.0.0-78</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2.3.1.0.0-78</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.3.2.3.1.0.0-78</version>
</dependency>
Issues while spark-submit :

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
        at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
        at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

 
Could someone help me if i am doing something wrong ?

Spark Submit:

export KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
export KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
export SPARK_KAFKA_VERSION=NONE

spark-submit --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties" --files "kafka.consumer.properties" --class com.example.ReadDataFromKafka HelloKafka-1.0-SNAPSHOT.jar



Regards,
William R