SparkSQL return all null fields when FIELDS TERMINATED BY '\t' and have a partition.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

SparkSQL return all null fields when FIELDS TERMINATED BY '\t' and have a partition.

Liu Yiding
This post has NOT been accepted by the mailing list yet.
Hi, all

I am using CDH 5.5(spark 1.5 and hive 1.1). I occurred a strange problem.
In hive:
hive> create table `tmp.test_d`(`id` int, `name` string) PARTITIONED BY (`dt` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
hive> load data local inpath '/var/lib/hive/dataimport/mendian/target/test.txt' OVERWRITE into table tmp.test_d partition(dt='2016-01-25');
hive> select * from tmp.test_d;
1       test    2016-01-25
2       xxxx    2016-01-25
Time taken: 0.267 seconds, Fetched: 2 row(s)

But in spark:
scala> sqlContext.sql("select * from tmp.test_d").collect
res9: Array[org.apache.spark.sql.Row] = Array([null,null,2016-01-25], [null,null,2016-01-25])

All fields return null.

But if I change field delimitor to '\u0001' or create a table didn't have partitions, it would be normal.