Creating Dataframe by querying Impala

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating Dataframe by querying Impala

morfious902002
This post has NOT been accepted by the mailing list yet.
Hi,
I am trying to create a Dataframe by querying Impala Table. It works fine in my local environment but when I try to run it in cluster I either get

Error:java.lang.ClassNotFoundException: com.cloudera.impala.jdbc41.Driver

or

No Suitable Driver found.

Can someone help me or direct me to how I can accomplish this?

I am using Spark 1.6.1. Here is my command (No Suitable Driver found error) :-
'/appserver/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit' '--master' 'yarn' '--deploy-mode' 'cluster' '--name' 'Livy' '--jars' "hdfs:///user/lib/ImpalaJDBC41.jar,hdfs:///user/lib/TCLIServiceClient.jar,hdfs:///user/lib/libfb303-0.9.0.jar,hdfs:///user/lib/libthrift-0.9.0.jar,hdfs:///user/lib/hive_metastore.jar,hdfs:///user/lib/hive_service.jar" '--class' 'Main.class' '--driver-memory' '5G' '--driver-cores' '2' '--executor-memory' '8G' '--driver-cores' '2' '--executor-cores' '3' '--num-executors' '2' 'my.jar' 'arg' 'arg' 'arg'
Reply | Threaded
Open this post in threaded view
|

Re: Creating Dataframe by querying Impala

morfious902002
This post has NOT been accepted by the mailing list yet.
The issue seems to be with primordial class loader. I cannot load the drivers to all the nodes at the same location but have loaded the jars to HDFS. I have tried SPARK_YARN_DIST_FILES as well as SPARK_CLASSPATH on the edge node with no luck. Is there another way to load these jars through  primordial class loader in YARN CLUSTER mode ? or do I have to add them inside SPARK assembly jar?

Thank You for all the help.