any distributed cache mechanism available in spark ?
I have been writing map-reduce on hadoop using PIG , and is now trying to migrate to SPARK.
My cluster consists of multiple nodes, and the jobs depend on a native library (.so files).
In hadoop and PIG , I could distribute the files across nodes using "-files" or "-archive" option, but I could not find any similar mechanism for SPARK.
Can some one please explain what are the best ways to distribute dependent files across nodes?
I have see an SparkContext.addFile() , but looks like this will copy big files everytime per job.
Moreover, I am not sure if addFile() can automatically unzip archive files.