State of datasource api v2

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

State of datasource api v2

Vladimir Prus
Hi,

I am trying to understand the state of datasource v2, and I'm a bit lost. On one hand, it is supposed to be more flexible approach, as described for example here:


On another hand, it appears both Parquet and ORC file readers are still not using v2 interface. There's an umbrella issue to address that:


but it does not have any sub-issues to address Parquet and the issue about ORC:


includes this text: "Not supported( due to limitation of data source V2): (1) Read multiple file path (2) Read bucketed file.".

Is there some up-to-date information whether datasource v2 will indeed become to primary datasource, whether parquet reader
will be converted to V2, and whether these limitations above will be fixed.

Thanks in advance,

--
Reply | Threaded
Open this post in threaded view
|

Re: State of datasource api v2

Arnaud LARROQUE
Hi Vladimir,

I've try to do the same here when I attempted to write a Spark connector for remote file.
From my point of view, There was a lot of change in the V2 API => Better semantic at least !

I understood that only continuous streaming use datasourceV2 (Not sure if im correct). But for file streaming, it falls back to V1 datasource. It is also the case for file reading.

I also would be glad to have a state of this.

Regards
Arnaud

On Mon, Jan 14, 2019 at 9:48 AM Vladimir Prus <[hidden email]> wrote:
Hi,

I am trying to understand the state of datasource v2, and I'm a bit lost. On one hand, it is supposed to be more flexible approach, as described for example here:


On another hand, it appears both Parquet and ORC file readers are still not using v2 interface. There's an umbrella issue to address that:


but it does not have any sub-issues to address Parquet and the issue about ORC:


includes this text: "Not supported( due to limitation of data source V2): (1) Read multiple file path (2) Read bucketed file.".

Is there some up-to-date information whether datasource v2 will indeed become to primary datasource, whether parquet reader
will be converted to V2, and whether these limitations above will be fixed.

Thanks in advance,

--