Not able to query on partitioned table in hive which is created using Spark dataframe

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Not able to query on partitioned table in hive which is created using Spark dataframe

Sunil Kumar
This post has NOT been accepted by the mailing list yet.
I have created a partitioned table in hive using dataframe but not able to query on it.

SPARK Code

import org.apache.spark.sql.hive.HiveContext

val hc = new HiveContext(sc)

//to read a csv file and create a dataframe
val df = hc.read.format("com.databricks.spark.csv").option("header","true").option("inferschema","true").load("file:///root/username/dataframedemo.csv")

//creating a partitioned table in hive
df.write.partitionBy("day","month").saveAsTable("emp_table");


HIVE Code

hive> describe formatted emp_table;
OK
# col_name              data_type               comment

col                     array<string>           from deserializer

# Detailed Table Information
Database:               default
Owner:                  root
CreateTime:             Thu Jun 02 13:03:56 UTC 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/emp_table
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   false
        EXTERNAL                FALSE
        numFiles                5
        numRows                 -1
        rawDataSize             -1
        spark.sql.sources.provider      org.apache.spark.sql.parquet
        spark.sql.sources.schema.numPartCols    2
        spark.sql.sources.schema.numParts       1
        spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[{\"name\":\"eid\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ename\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"year\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"day\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"month\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}
        spark.sql.sources.schema.partCol.0      day
        spark.sql.sources.schema.partCol.1      month
        totalSize               3540
        transient_lastDdlTime   1464872636

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
InputFormat:            org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        path                    hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/emp_table
        serialization.format    1


hive> select * from emp_table;
OK
Failed with exception java.io.IOException:java.io.IOException: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/emp_table/day=21/month=1/part-r-00000-44b98c97-7993-4e09-bfaa-bb44e13a43c9.gz.parquet not a SequenceFile

Not able to access this table from hive.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not able to query on partitioned table in hive which is created using Spark dataframe

rajeshspark
This post has NOT been accepted by the mailing list yet.
Hi ,

 Any solution or workaround for the above issue. Why Hive behaves this way ?
Loading...