Not able to query on partitioned table in hive which is created using Spark dataframe

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Not able to query on partitioned table in hive which is created using Spark dataframe

Sunil Kumar
This post has NOT been accepted by the mailing list yet.
I have created a partitioned table in hive using dataframe but not able to query on it.

SPARK Code

import org.apache.spark.sql.hive.HiveContext

val hc = new HiveContext(sc)

//to read a csv file and create a dataframe
val df = hc.read.format("com.databricks.spark.csv").option("header","true").option("inferschema","true").load("file:///root/username/dataframedemo.csv")

//creating a partitioned table in hive
df.write.partitionBy("day","month").saveAsTable("emp_table");


HIVE Code

hive> describe formatted emp_table;
OK
# col_name              data_type               comment

col                     array<string>           from deserializer

# Detailed Table Information
Database:               default
Owner:                  root
CreateTime:             Thu Jun 02 13:03:56 UTC 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/emp_table
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   false
        EXTERNAL                FALSE
        numFiles                5
        numRows                 -1
        rawDataSize             -1
        spark.sql.sources.provider      org.apache.spark.sql.parquet
        spark.sql.sources.schema.numPartCols    2
        spark.sql.sources.schema.numParts       1
        spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"eid\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ename\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"year\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"day\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"month\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}
        spark.sql.sources.schema.partCol.0      day
        spark.sql.sources.schema.partCol.1      month
        totalSize               3540
        transient_lastDdlTime   1464872636

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
InputFormat:            org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        path                    hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/emp_table
        serialization.format    1


hive> select * from emp_table;
OK
Failed with exception java.io.IOException:java.io.IOException: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/emp_table/day=21/month=1/part-r-00000-44b98c97-7993-4e09-bfaa-bb44e13a43c9.gz.parquet not a SequenceFile

Not able to access this table from hive.


Reply | Threaded
Open this post in threaded view
|

Re: Not able to query on partitioned table in hive which is created using Spark dataframe

rajeshspark
This post has NOT been accepted by the mailing list yet.
Hi ,

 Any solution or workaround for the above issue. Why Hive behaves this way ?