HiveThriftserver does not seem to respect partitions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

HiveThriftserver does not seem to respect partitions

Yana
Hi folks, I have created a table in the following manner:

CREATE EXTERNAL TABLE IF NOT EXISTS  rum_beacon_partition (

     .... list of columns

)

COMMENT 'User Infomation'

PARTITIONED BY (account_id String,

product String,

group_id String,

year String,

month String,

day String)

STORED AS PARQUET

LOCATION '/stream2store/nt_tp_collation'


I then ran MSCK REPAIR TABLE to generate the partition information.


I think partitions got generated correctly -- here is a query and it's output:


"show table extended like 'rum_beacon_partition' partition(account_id='XXXX',product='rum',group_id='XXXX',year='2017',month='09',day='12')


 location:ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id= XXXX/year=2017/month=09/day=12  


However, it does appear that when I issue a SQL query, the predicates do not correctly limit the files touched:

explain extended select uri from rum_beacon_partition where account_id='XXXX' and product='rum' and group_id='XXXX' and year='2017' and month='09' limit 2


Produces output that seems to indicate that every file is being touched (unless I'm misreading the output). It also crashes my filesystem so I suspect there is some truth to it. 

Optimized logical plan looks fine I think:

== Optimized Logical Plan == |

| Limit 2 |

| Project [uri#16519] |

| Filter (((((account_id#16511 = XXXX) && (product#16512 = rum)) && (group_id#16513 = XXXX)) && (year#16514 = 2017)) && (month#16515 = 09)) |


But in the physical plan it seems that a ton of files are touched (both in account and date partitions)

Scan ParquetRelation[ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=16,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=17,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=18,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=19,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=20,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=21,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=22,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=23,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=24,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=25,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=26,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=27,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=28,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=29,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=30,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=31,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=06/day=01,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=06/day=02



I am hoping someone can offer debugging tips / advice on what to look for in the logs. I'm on a pretty old version of Spark (1.5.2) but this seems like something that I'm doing wrong.