[Datasource API V2] Creating datasource - no step for final cleanup on read

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Datasource API V2] Creating datasource - no step for final cleanup on read

Alex Rehnby
Hello,

Currently working on creating a custom datasource using the Spark Datasource API V2. On read, our datasource uses some temporary files in a distributed store which we'd like to run some cleanup step on once the entire operation is done. However, there does not seem to be anything called in the API for an entire read being done, only the close() function on individual PartitionReaders. 

What I was looking for would be the equivalent to the commit() and abort() functions in BatchWrite, but for the Scan or Batch class. I'm wondering if there's any good way to achieve running something at the end of the read operation using the current API? If not, I would ask if this might be a useful addition, or if there are design reasons for not including such a step.

Thanks,
Alex