This post has NOT been accepted by the mailing list yet.
Hi everyone, I need some information regarding MongoDB and Spark. I have an Spark cluster of 3 nodes (1 worker and two data nodes) with YARN scheduler and I have a mongodb server which is outside the cluster but in the same LAN. Now I need to load whole the data that were stored in the MongoDB database to that cluster (HDFS) . Is it possible ? By using mongo-hadoop connector and pymongo_spark connector I can load data from a mongo server process it in the spark cluster and get back the computed result in the mongo server. But my problem is that I want to transfer my whole data to the spark cluster.