Spark with HBase on Spark Runtime 2.2.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark with HBase on Spark Runtime 2.2.1

SparkUser6
I wrote a simple program to read data from HBase, the program works find in
Cloudera backed by HDFS.  The program works fine on SPARK RUNTIME 1.6 on
Cloudera.  But does NOT work on EMR with Spark Runtime 2.2.1.

But getting an exception while testing data on EMR with S3.

// Spark conf
        SparkConf sparkConf = new
SparkConf().setMaster("local[4]").setAppName("My App");
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        // Hbase conf
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","localhost");
        conf.set("hbase.zookeeper.property.client.port","2181");
        // Submit scan into hbase conf
 //       conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(scan));

        conf.set(TableInputFormat.INPUT_TABLE, "mytable");
        conf.set(TableInputFormat.SCAN_ROW_START, "startrow");
        conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow");

        // Get RDD
        JavaPairRDD<ImmutableBytesWritable, Result> source = jsc
                .newAPIHadoopRDD(conf, TableInputFormat.class,
                        ImmutableBytesWritable.class, Result.class);

        // Process RDD
        System.out.println("&&&&&&&&&&&&&&&&&&&&&&& " + source.count());



0
down vote
favorite
I wrote a simple program to read data from HBase, the program works find in
Cloudera backed by HDFS.

But getting an exception while testing data on EMR with S3.

// Spark conf
        SparkConf sparkConf = new
SparkConf().setMaster("local[4]").setAppName("My App");
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        // Hbase conf
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","localhost");
        conf.set("hbase.zookeeper.property.client.port","2181");
        // Submit scan into hbase conf
 //       conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(scan));

        conf.set(TableInputFormat.INPUT_TABLE, "mytable");
        conf.set(TableInputFormat.SCAN_ROW_START, "startrow");
        conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow");

        // Get RDD
        JavaPairRDD<ImmutableBytesWritable, Result> source = jsc
                .newAPIHadoopRDD(conf, TableInputFormat.class,
                        ImmutableBytesWritable.class, Result.class);

        // Process RDD
        System.out.println("&&&&&&&&&&&&&&&&&&&&&&& " + source.count());
18/05/04 00:22:02 INFO MetricRegistries: Loaded MetricRegistries class
org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 00:22:02
ERROR TableInputFormat: java.io.IOException:
java.lang.reflect.InvocationTargetException at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
Caused by: java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
Caused by: java.lang.IllegalAccessError: tried to access class
org.apache.hadoop.metrics2.lib.MetricsInfoImpl from class
org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry at
org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newGauge(DynamicMetricsRegistry.java:139)
at
org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:59)
at
org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:51)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
java.lang.Class.newInstance(Class.java:442) at
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ...
42 more

Exception in thread "main" java.io.IOException: Cannot create a record
reader because of a previous error. Please look at the previous
logs lines from the task's full log for more details. at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at
scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at
org.apache.spark.rdd.RDD.count(RDD.scala:1158) at
org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455) at
org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45) at
HbaseScan.main(HbaseScan.java:60) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:
java.lang.IllegalStateException: The input format instance has not been
properly initialized. Ensure you call initializeTable either in your
constructor or initialize method at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:652)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:265)
... 20 more With ALL APACHE HBASE LIBS:
e.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 04:05:54 ERROR
TableInputFormat: java.io.IOException:
java.lang.reflect.InvocationTargetException at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:202)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:259)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at
scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at
org.apache.spark.rdd.RDD.count(RDD.scala:1158) at
org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455) at
org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45) at
HbaseScan.main(HbaseScan.java:60) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 24 more Caused by: java.lang.RuntimeException: Could not create
interface org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSource Is the
hadoop compatibility jar on the classpath? at
org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:75)
at
org.apache.hadoop.hbase.zookeeper.MetricsZooKeeper.(MetricsZooKeeper.java:38)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.(RecoverableZooKeeper.java:130)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:143) at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:181)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:155)
at
org.apache.hadoop.hbase.client.ZooKeeperKeepAliveConnection.(ZooKeeperKeepAliveConnection.java:43)
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(ConnectionManager.java:1737)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]