Pyspark Error: Unable to read a hive table with transactional property set as 'True'

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Pyspark Error: Unable to read a hive table with transactional property set as 'True'

Debabrata Ghosh
Hi All,
                       Greetings ! I needed some help to read a Hive table via Pyspark for which the transactional property is set to 'True' (In other words ACID property is enabled). Following is the entire stacktrace and the description of the hive table. Would you please be able to help me resolve the error:

18/03/01 11:06:22 INFO BlockManagerMaster: Registered BlockManager
18/03/01 11:06:22 INFO EventLoggingListener: Logging events to hdfs:///spark-history/local-1519923982155
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.3
      /_/

Using Python version 2.7.12 (default, Jul  2 2016 17:42:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> from pyspark.sql import HiveContext
>>> hive_context = HiveContext(sc)
>>> hive_context.sql("select count(*) from load_etl.trpt_geo_defect_prod_dec07_del_blank").show()
18/03/01 11:09:45 INFO HiveContext: Initializing execution hive, version 1.2.1
18/03/01 11:09:45 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.6.0.3-8
18/03/01 11:09:45 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.0.3-8
18/03/01 11:09:46 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/03/01 11:09:46 INFO ObjectStore: ObjectStore, initialize called
18/03/01 11:09:46 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/03/01 11:09:46 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/03/01 11:09:50 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/03/01 11:09:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:54 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/03/01 11:09:54 INFO ObjectStore: Initialized ObjectStore
18/03/01 11:09:54 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/03/01 11:09:54 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/03/01 11:09:54 INFO HiveMetaStore: Added admin role in metastore
18/03/01 11:09:54 INFO HiveMetaStore: Added public role in metastore
18/03/01 11:09:55 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/03/01 11:09:55 INFO HiveMetaStore: 0: get_all_databases
18/03/01 11:09:55 INFO audit: ugi=[hidden email]   ip=unknown-ip-addr      cmd=get_all_databases
18/03/01 11:09:55 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/03/01 11:09:55 INFO audit: ugi=[hidden email]   ip=unknown-ip-addr      cmd=get_functions: db=default pat=*
18/03/01 11:09:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:55 INFO SessionState: Created local directory: /tmp/22ea9ac9-23d1-4247-9e02-ce45809cd9ae_resources
18/03/01 11:09:55 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/22ea9ac9-23d1-4247-9e02-ce45809cd9ae
18/03/01 11:09:55 INFO SessionState: Created local directory: /tmp/hdetldev/22ea9ac9-23d1-4247-9e02-ce45809cd9ae
18/03/01 11:09:55 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/22ea9ac9-23d1-4247-9e02-ce45809cd9ae/_tmp_space.db
18/03/01 11:09:55 INFO HiveContext: default warehouse location is /user/hive/warehouse
18/03/01 11:09:55 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/03/01 11:09:55 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.6.0.3-8
18/03/01 11:09:55 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.0.3-8
18/03/01 11:09:56 INFO metastore: Trying to connect to metastore with URI thrift://ip.com:9083
18/03/01 11:09:56 INFO metastore: Connected to metastore.
18/03/01 11:09:56 INFO SessionState: Created local directory: /tmp/24379bb3-8ddf-4716-b68d-07ac0f92d9f1_resources
18/03/01 11:09:56 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/24379bb3-8ddf-4716-b68d-07ac0f92d9f1
18/03/01 11:09:56 INFO SessionState: Created local directory: /tmp/hdetldev/24379bb3-8ddf-4716-b68d-07ac0f92d9f1
18/03/01 11:09:56 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/24379bb3-8ddf-4716-b68d-07ac0f92d9f1/_tmp_space.db
18/03/01 11:09:56 INFO ParseDriver: Parsing command: select count(*) from load_etl.trpt_geo_defect_prod_dec07_del_blank
18/03/01 11:09:57 INFO ParseDriver: Parse Completed
18/03/01 11:09:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 813.6 KB, free 510.3 MB)
18/03/01 11:09:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 57.5 KB, free 510.3 MB)
18/03/01 11:09:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:35508 (size: 57.5 KB, free: 511.1 MB)
18/03/01 11:09:57 INFO SparkContext: Created broadcast 0 from showString at NativeMethodAccessorImpl.java:-2
18/03/01 11:09:58 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
18/03/01 11:09:58 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark-client/python/pyspark/sql/dataframe.py", line 257, in show
    print(self._jdf.showString(n, truncate))
  File "/var/opt/teradata/anaconda4.1.1/anaconda/lib/python2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/var/opt/teradata/anaconda4.1.1/anaconda/lib/python2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#60L])
+- TungstenExchange SinglePartition, None
   +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#63L])
      +- HiveTableScan MetastoreRelation load_etl, trpt_geo_defect_prod_dec07_del_blank, None

        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
        at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:187)
        at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
        at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
        at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500)
        at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
        at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2087)
        at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1499)
        at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1506)
        at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1376)
        at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
        at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2100)
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1375)
        at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1457)
        at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange SinglePartition, None
+- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#63L])
   +- HiveTableScan MetastoreRelation load_etl, trpt_geo_defect_prod_dec07_del_blank, None

        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
        at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
        at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
        ... 36 more
Caused by: java.lang.RuntimeException: serious problem
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
        at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
        at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
        at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
        ... 44 more
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0003024_0000"
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
        ... 75 more
Caused by: java.lang.NumberFormatException: For input string: "0003024_0000"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:310)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:379)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more








Here is the detail of the table creation:



Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
0: jdbc:hive2://toplxhdmd001.rights.com> show create table load_etl.trpt_geo_defect_prod_dec07_del_blank;
+-----------------------------------------------------------------------------------------------+--+
|                                        createtab_stmt                                         |
+-----------------------------------------------------------------------------------------------+--+
| CREATE TABLE `load_etl.trpt_geo_defect_prod_dec07_del_blank`(                                 |
|   `line_seg_nbr` int,                                                                         |
|   `track_type` string,                                                                        |
|   `track_sdtk_nbr` string,                                                                    |
|   `mile_post_beg` double,                                                                     |
|   `ss_nbr` int,                                                                               |
|   `ss_len` int,                                                                               |
|   `ris1mpb` double,                                                                           |
|   `mile_label` string,                                                                        |
|   `test_dt` string,                                                                           |
|   `def_prty` string,                                                                          |
|   `def_nbr` int,                                                                              |
|   `def_type` string,                                                                          |
|   `def_ampltd` double,                                                                        |
|   `def_lgth` int,                                                                             |
|   `car_cd` string,                                                                            |
|   `tsc_cd` string,                                                                            |
|   `class` string,                                                                             |
|   `test_fspd` string,                                                                         |
|   `test_pspd` string,                                                                         |
|   `restr_fspd` string,                                                                        |
|   `restr_pspd` string,                                                                       |
|   `def_land_mark` string,                                                                     |
|   `repeat_cd` string,                                                                         |
|   `mp_incr_cd` string,                                                                        |
|   `test_trk_dir` string,                                                                      |
|   `eff_dt` string,                                                                            |
|   `trk_file` string,                                                                          |
|   `dfct_cor_dt` string,                                                                       |
|   `dfct_acvt` string,                                                                         |
|   `dfct_slw_ord_ind` string,                                                                  |
|   `emp_id` string,                                                                            |
|   `eff_ts` string,                                                                            |
|   `dfct_cor_tm` string,                                                                       |
|   `dfct_freight_spd` int,                                                                     |
|   `dfct_amtrak_spd` int,                                                                      |
|   `mile_post_sfx` string,                                                                     |
|   `work_order_id` string,                                                                     |
|   `loc_id_beg` string,                                                                        |
|   `loc_id_end` string,                                                                        |
|   `link_id` string,                                                                           |
|   `lst_maint_ts` string,                                                                      |
|   `del_ts` string,                                                                            |
|   `gps_longitude` double,                                                                     |
|   `gps_latitude` double,                                                                      |
|   `geo_car_nme` string,                                                                       |
|   `rept_gc_nme` string,                                                                       |
|   `rept_dfct_tst` string,                                                                     |
|   `rept_dfct_nbr` int,                                                                        |
|   `restr_trk_cls` string,                                                                     |
|   `tst_hist_cd` string,                                                                       |
|   `cret_ts` string,                                                                           |
|   `ylw_grp_nbr` int,                                                                          |
|   `geo_dfct_grp_nme` string,                                                                  |
|   `supv_rollup_cd` string,                                                                    |
|   `dfct_stat_cd` string,                                                                      |
|   `lst_maint_id` string,                                                                      |
|   `del_rsn_cd` string,                                                                        |
|   `umt_prcs_user_id` string,                                                                  |
|   `gdfct_vinsp_srestr` string,                                                                |
|   `gc_opr_init` string)                                                                       |
| CLUSTERED BY (                                                                                |
|   geo_car_nme)                                                                                |
| INTO 2 BUCKETS                                                                                |
| ROW FORMAT SERDE                                                                              |
|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'                                                 |
| STORED AS INPUTFORMAT                                                                         |
|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'                                           |
| OUTPUTFORMAT                                                                                  |
|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'                                          |
| LOCATION                                                                                      |
|   'hdfs://HADOOP02/apps/hive/warehouse/load_etl.db/trpt_geo_defect_prod_dec07_del_blank'  |
| TBLPROPERTIES (                                                                               |
|   'numFiles'='4',                                                                             |
|   'numRows'='0',                                                                              |
|   'rawDataSize'='0',                                                                          |
|   'totalSize'='2566942',                                                                      |
|   'transactional'='true',                                                                     |
|   'transient_lastDdlTime'='1518695199')                                                       |
+-----------------------------------------------------------------------------------------------+--+


Thanks,
D
Reply | Threaded
Open this post in threaded view
|

Re: Pyspark Error: Unable to read a hive table with transactional property set as 'True'

ayan guha
Hi

Couple of questions:

1. It seems the error is due to number format:
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0003024_0000"
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
        ... 75 more
Why do you think it is due to ACID?

2. You should not be creating Hive Context again in REPL, no need for that. REPL already reports: SparkContext available as sc, HiveContext available as sqlContext.

3. Have you tried the same with spark 2.x?  



On Sat, Mar 3, 2018 at 5:00 AM, Debabrata Ghosh <[hidden email]> wrote:
Hi All,
                       Greetings ! I needed some help to read a Hive table via Pyspark for which the transactional property is set to 'True' (In other words ACID property is enabled). Following is the entire stacktrace and the description of the hive table. Would you please be able to help me resolve the error:

18/03/01 11:06:22 INFO BlockManagerMaster: Registered BlockManager
18/03/01 11:06:22 INFO EventLoggingListener: Logging events to hdfs:///spark-history/local-1519923982155
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.3
      /_/

Using Python version 2.7.12 (default, Jul  2 2016 17:42:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> from pyspark.sql import HiveContext
>>> hive_context = HiveContext(sc)
>>> hive_context.sql("select count(*) from load_etl.trpt_geo_defect_prod_dec07_del_blank").show()
18/03/01 11:09:45 INFO HiveContext: Initializing execution hive, version 1.2.1
18/03/01 11:09:45 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.6.0.3-8
18/03/01 11:09:45 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.0.3-8
18/03/01 11:09:46 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/03/01 11:09:46 INFO ObjectStore: ObjectStore, initialize called
18/03/01 11:09:46 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/03/01 11:09:46 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/03/01 11:09:50 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/03/01 11:09:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:54 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/03/01 11:09:54 INFO ObjectStore: Initialized ObjectStore
18/03/01 11:09:54 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/03/01 11:09:54 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/03/01 11:09:54 INFO HiveMetaStore: Added admin role in metastore
18/03/01 11:09:54 INFO HiveMetaStore: Added public role in metastore
18/03/01 11:09:55 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/03/01 11:09:55 INFO HiveMetaStore: 0: get_all_databases
18/03/01 11:09:55 INFO audit: ugi=[hidden email]   ip=unknown-ip-addr      cmd=get_all_databases
18/03/01 11:09:55 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/03/01 11:09:55 INFO audit: ugi=[hidden email]   ip=unknown-ip-addr      cmd=get_functions: db=default pat=*
18/03/01 11:09:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/03/01 11:09:55 INFO SessionState: Created local directory: /tmp/22ea9ac9-23d1-4247-9e02-ce45809cd9ae_resources
18/03/01 11:09:55 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/22ea9ac9-23d1-4247-9e02-ce45809cd9ae
18/03/01 11:09:55 INFO SessionState: Created local directory: /tmp/hdetldev/22ea9ac9-23d1-4247-9e02-ce45809cd9ae
18/03/01 11:09:55 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/22ea9ac9-23d1-4247-9e02-ce45809cd9ae/_tmp_space.db
18/03/01 11:09:55 INFO HiveContext: default warehouse location is /user/hive/warehouse
18/03/01 11:09:55 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/03/01 11:09:55 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.6.0.3-8
18/03/01 11:09:55 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.0.3-8
18/03/01 11:09:56 INFO metastore: Trying to connect to metastore with URI thrift://ip.com:9083
18/03/01 11:09:56 INFO metastore: Connected to metastore.
18/03/01 11:09:56 INFO SessionState: Created local directory: /tmp/24379bb3-8ddf-4716-b68d-07ac0f92d9f1_resources
18/03/01 11:09:56 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/24379bb3-8ddf-4716-b68d-07ac0f92d9f1
18/03/01 11:09:56 INFO SessionState: Created local directory: /tmp/hdetldev/24379bb3-8ddf-4716-b68d-07ac0f92d9f1
18/03/01 11:09:56 INFO SessionState: Created HDFS directory: /tmp/hive/hdetldev/24379bb3-8ddf-4716-b68d-07ac0f92d9f1/_tmp_space.db
18/03/01 11:09:56 INFO ParseDriver: Parsing command: select count(*) from load_etl.trpt_geo_defect_prod_dec07_del_blank
18/03/01 11:09:57 INFO ParseDriver: Parse Completed
18/03/01 11:09:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 813.6 KB, free 510.3 MB)
18/03/01 11:09:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 57.5 KB, free 510.3 MB)
18/03/01 11:09:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:35508 (size: 57.5 KB, free: 511.1 MB)
18/03/01 11:09:57 INFO SparkContext: Created broadcast 0 from showString at NativeMethodAccessorImpl.java:-2
18/03/01 11:09:58 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
18/03/01 11:09:58 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark-client/python/pyspark/sql/dataframe.py", line 257, in show
    print(self._jdf.showString(n, truncate))
  File "/var/opt/teradata/anaconda4.1.1/anaconda/lib/python2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/var/opt/teradata/anaconda4.1.1/anaconda/lib/python2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#60L])
+- TungstenExchange SinglePartition, None
   +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#63L])
      +- HiveTableScan MetastoreRelation load_etl, trpt_geo_defect_prod_dec07_del_blank, None

        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
        at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:187)
        at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
        at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
        at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500)
        at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
        at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2087)
        at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1499)
        at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1506)
        at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1376)
        at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
        at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2100)
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1375)
        at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1457)
        at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange SinglePartition, None
+- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#63L])
   +- HiveTableScan MetastoreRelation load_etl, trpt_geo_defect_prod_dec07_del_blank, None

        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
        at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
        at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
        ... 36 more
Caused by: java.lang.RuntimeException: serious problem
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
        at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
        at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
        at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
        ... 44 more
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0003024_0000"
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
        ... 75 more
Caused by: java.lang.NumberFormatException: For input string: "0003024_0000"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:310)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:379)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more








Here is the detail of the table creation:



Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
0: jdbc:hive2://toplxhdmd001.rights.com> show create table load_etl.trpt_geo_defect_prod_dec07_del_blank;
+-----------------------------------------------------------------------------------------------+--+
|                                        createtab_stmt                                         |
+-----------------------------------------------------------------------------------------------+--+
| CREATE TABLE `load_etl.trpt_geo_defect_prod_dec07_del_blank`(                                 |
|   `line_seg_nbr` int,                                                                         |
|   `track_type` string,                                                                        |
|   `track_sdtk_nbr` string,                                                                    |
|   `mile_post_beg` double,                                                                     |
|   `ss_nbr` int,                                                                               |
|   `ss_len` int,                                                                               |
|   `ris1mpb` double,                                                                           |
|   `mile_label` string,                                                                        |
|   `test_dt` string,                                                                           |
|   `def_prty` string,                                                                          |
|   `def_nbr` int,                                                                              |
|   `def_type` string,                                                                          |
|   `def_ampltd` double,                                                                        |
|   `def_lgth` int,                                                                             |
|   `car_cd` string,                                                                            |
|   `tsc_cd` string,                                                                            |
|   `class` string,                                                                             |
|   `test_fspd` string,                                                                         |
|   `test_pspd` string,                                                                         |
|   `restr_fspd` string,                                                                        |
|   `restr_pspd` string,                                                                       |
|   `def_land_mark` string,                                                                     |
|   `repeat_cd` string,                                                                         |
|   `mp_incr_cd` string,                                                                        |
|   `test_trk_dir` string,                                                                      |
|   `eff_dt` string,                                                                            |
|   `trk_file` string,                                                                          |
|   `dfct_cor_dt` string,                                                                       |
|   `dfct_acvt` string,                                                                         |
|   `dfct_slw_ord_ind` string,                                                                  |
|   `emp_id` string,                                                                            |
|   `eff_ts` string,                                                                            |
|   `dfct_cor_tm` string,                                                                       |
|   `dfct_freight_spd` int,                                                                     |
|   `dfct_amtrak_spd` int,                                                                      |
|   `mile_post_sfx` string,                                                                     |
|   `work_order_id` string,                                                                     |
|   `loc_id_beg` string,                                                                        |
|   `loc_id_end` string,                                                                        |
|   `link_id` string,                                                                           |
|   `lst_maint_ts` string,                                                                      |
|   `del_ts` string,                                                                            |
|   `gps_longitude` double,                                                                     |
|   `gps_latitude` double,                                                                      |
|   `geo_car_nme` string,                                                                       |
|   `rept_gc_nme` string,                                                                       |
|   `rept_dfct_tst` string,                                                                     |
|   `rept_dfct_nbr` int,                                                                        |
|   `restr_trk_cls` string,                                                                     |
|   `tst_hist_cd` string,                                                                       |
|   `cret_ts` string,                                                                           |
|   `ylw_grp_nbr` int,                                                                          |
|   `geo_dfct_grp_nme` string,                                                                  |
|   `supv_rollup_cd` string,                                                                    |
|   `dfct_stat_cd` string,                                                                      |
|   `lst_maint_id` string,                                                                      |
|   `del_rsn_cd` string,                                                                        |
|   `umt_prcs_user_id` string,                                                                  |
|   `gdfct_vinsp_srestr` string,                                                                |
|   `gc_opr_init` string)                                                                       |
| CLUSTERED BY (                                                                                |
|   geo_car_nme)                                                                                |
| INTO 2 BUCKETS                                                                                |
| ROW FORMAT SERDE                                                                              |
|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'                                                 |
| STORED AS INPUTFORMAT                                                                         |
|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'                                           |
| OUTPUTFORMAT                                                                                  |
|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'                                          |
| LOCATION                                                                                      |
|   'hdfs://HADOOP02/apps/hive/warehouse/load_etl.db/trpt_geo_defect_prod_dec07_del_blank'  |
| TBLPROPERTIES (                                                                               |
|   'numFiles'='4',                                                                             |
|   'numRows'='0',                                                                              |
|   'rawDataSize'='0',                                                                          |
|   'totalSize'='2566942',                                                                      |
|   'transactional'='true',                                                                     |
|   'transient_lastDdlTime'='1518695199')                                                       |
+-----------------------------------------------------------------------------------------------+--+


Thanks,
D



--
Best Regards,
Ayan Guha