External table partitioned by date using Spark SQL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

External table partitioned by date using Spark SQL

ehalpern
I have large json files stored in S3 grouped under a sub-key for each year like this:
logs/
  1980/
    1-1-1980.json
    1-2-1980.json
  1981/
    ... 
I've defined an external table that's partitioned by year to keep the year limited queries efficient.  The table definition looks like this:
CREATE EXTERNAL TABLE contributions (contributer string, amount int) PARTITIONED BY (year int) 
ROW FORMAT serde 'com.cloudera.hive.serde.JSONSerDe'
LOCATION 's3://spark-data/logs'
ALTER TABLE contributions ADD PARTITION(year=1980)
But alas, a simple query like:
SELECT * from nc limit 1;
yields no results.  

If I remove the "PARTITIONED BY" clause and point directly at s3://spark-data/logs/1980, everything works fine.  Am I running into a bug or are range partitions not supported on external tables?  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: External table partitioned by date using Spark SQL

ehalpern
I just realized my mistake.  The name of the partition subdirectory needs to include the field name and value.  Instead of
logs/1980
it should be
logs/year=1980

With this fix, the partitioned table is working as expected.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: External table partitioned by date using Spark SQL

santhavathi
This post has NOT been accepted by the mailing list yet.
I am creating an external table a below from spark beeline.
0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE test_table  (name string,phone string) PARTITIONED BY(dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://test-bucket/test/';
+---------+--+
| result  |
+---------+--+
+---------+--+
No rows selected (0.67 seconds)


Now I am getting below error when I select from the table.
0: jdbc:hive2://localhost:10000> select * from test_table;
 Error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_table. Bucket 172.31.26.109 does not exist (state=,code=0)


I am unable to drop the table also.

0: jdbc:hive2://localhost:10000> drop table test_table;
Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException Unable to fetch table test_table. Bucket 172.31.26.109 does not exist (state=,code=0)
0: jdbc:hive2://localhost:10000> show tables;
+--------------+--------------+--+
|  tableName   | isTemporary  |
+--------------+--------------+--+
| test_table   | false        |      
+--------------+--------------+———————+


So I manually delete it from the Hive metastore.
cassandra@cqlsh:HiveMetaStore> delete from "HiveMetaStore".sparkmetastore where key='_1_default' and entity='org.apache.hadoop.hive.metastore.api.Table::test_table';

0: jdbc:hive2://localhost:10000> show tables;
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
+------------+--------------+--+
No rows selected (0.285 seconds)


On trying to create the same table again, it does not get created, whereas I am able to create the table with a new name.
Basically, I am unable to create the same table after deleting that table from the Hivemetastore.

0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE test_table  (name string,phone string) PARTITIONED BY(dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://test-bucket/test/';
+---------+--+
| result  |
+---------+--+
+---------+--+
No rows selected (0.464 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
+------------+--------------+--+
No rows selected (0.251 seconds)


My S3 directory structure looks like below
test-bucket/test/dt=date20170707/file1.csv
test-bucket/test/dt=date20170708/file2.csv
test-bucket/test/dt=date20170709/file3.csv


What am I missing here.
Loading...