Spark ORC store written timestamp as column

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark ORC store written timestamp as column

Manjunath Shetty H
Hi All,

Is there anyway to store the exact written timestamp in the ORC file through spark ?.
Use case something like `current_timestamp()` function in SQL. Generating in the program will not be equal to actual write time in ORC/hdfs file.

Any suggestions will be helpful.


Thanks
Manjunath
Reply | Threaded
Open this post in threaded view
|

Re: Spark ORC store written timestamp as column

ZHANG Wei
From what I think I understand, the OrcOutputWriter leverages orc-core
to write. I'm wondering if ORC supports the row metadata or not. If
not, maybe the org.apache.orc.Writer::addRowBatch() can be overrided to
record the metadata after RowBatch written.

--
Cheers,
-z

On Thu, 16 Apr 2020 04:47:31 +0000
Manjunath Shetty H <[hidden email]> wrote:

> Hi All,
>
> Is there anyway to store the exact written timestamp in the ORC file through spark ?.
> Use case something like `current_timestamp()` function in SQL. Generating in the program will not be equal to actual write time in ORC/hdfs file.
>
> Any suggestions will be helpful.
>
>
> Thanks
> Manjunath

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]