Spark wrote to Hive table. file content format and fileformat in metadata doesn't match
We are currently trying to replace hive with Spark thrift server.
We encounter a problem. With the following sql: create table
test_db.sink_test as select [some columns] from test_db.test_source
After the SQL run successfully, we queried data from test_db.test_sink.
The data is gibberish.
After some inspection, we found that test_db.test_sink has orc file
be read with spark.read.orc) on hdfs, but the metadata for it is text.
the output column names are not column names from test_db.test_source,
but something like:
|_col0| _col1| _col2| _col3| _col4|
_col5|_col6|_col7| _col8|_col9| _col10| _col11|_col12|
What is mysterious is that after rerunning the SQL, without any changes,
the table will be
alright (the file content and file format in metadata matches).
I wonder if anyone has encountered the same problem.