Setup/Cleanup for RDD closures?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Setup/Cleanup for RDD closures?

Stephen Boesch

Consider there is some connection / external resource allocation required to be accessed/mutated by each of the rows from within a single worker thread.  That connection should only  be opened/closed before the first row is accessed / after the last row is completed.

It is my understanding that there is work presently underway (Reynold Xin and others)  on defining an external resources API to address this. What is the recommended approach in the meanwhile?
Reply | Threaded
Open this post in threaded view
|

Re: Setup/Cleanup for RDD closures?

Mayur Rustagi
Current approach is to use mappartition, initialize the connection in the beginning, iterate through the data & close off the connector. 


Mayur Rustagi
Ph: +1 (760) 203 3257

On Fri, Oct 3, 2014 at 10:16 AM, Stephen Boesch <[hidden email]> wrote:

Consider there is some connection / external resource allocation required to be accessed/mutated by each of the rows from within a single worker thread.  That connection should only  be opened/closed before the first row is accessed / after the last row is completed.

It is my understanding that there is work presently underway (Reynold Xin and others)  on defining an external resources API to address this. What is the recommended approach in the meanwhile?

Reply | Threaded
Open this post in threaded view
|

Re: Setup/Cleanup for RDD closures?

sowen
Yes, though it's a little more complex than that:

http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dRNAg@...%3E

On Fri, Oct 3, 2014 at 9:58 AM, Mayur Rustagi <[hidden email]> wrote:

> Current approach is to use mappartition, initialize the connection in the
> beginning, iterate through the data & close off the connector.
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
>
>
> On Fri, Oct 3, 2014 at 10:16 AM, Stephen Boesch <[hidden email]> wrote:
>>
>>
>> Consider there is some connection / external resource allocation required
>> to be accessed/mutated by each of the rows from within a single worker
>> thread.  That connection should only  be opened/closed before the first row
>> is accessed / after the last row is completed.
>>
>> It is my understanding that there is work presently underway (Reynold Xin
>> and others)  on defining an external resources API to address this. What is
>> the recommended approach in the meanwhile?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]