any distributed cache mechanism available in spark ?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

any distributed cache mechanism available in spark ?

I have been writing map-reduce on hadoop using PIG , and is now trying to migrate to SPARK.

My cluster consists of multiple nodes, and the jobs depend on a native library (.so files).
In hadoop and PIG , I could distribute the files across nodes using  "-files" or "-archive" option, but I could not find any similar mechanism for SPARK.

Can some one please explain what are the best ways to distribute dependent files across nodes?
I have see an SparkContext.addFile() , but looks like this will copy big files everytime per job.
Moreover, I am not sure if addFile() can automatically unzip archive files.

thanks in advance.