saveAsNewAPIHadoopFile and Relative Paths on Mesos
When running with Spark 0.9.0 against Mesos, I can't use
saveAsNewAPIHadoopFile to save to a relative path (i.e. on the local
filesystem, relative to the master process's current working
directory). I'm writing in Parquet, so I see that no .parquet files
end up in that directory, and I get an error about the footer not
getting written (presumably since none of the data files were
Relative paths work when running Spark against local or local, and
absolute paths on the local filesystem work when running on Mesos. And
both relative and absolute paths work perfectly fine for reading from
the master's filesystem with newAPIHadoopFile.
I think the issue here is that the workers are evaluating the relative
path relative to whatever *their* current directory happens to be,
which, since Mesos runs them, isn't necessarily the same as that of
the master process. Since the worker nodes have the filesystem I am
working on mounted at the same location as the master does, an
absolute path works to get the data to the same place from both worker
and master nodes.
I think Spark should handle the conversion of relative paths on the
master to absolute paths that the workers can use no matter what their
working directories happen to be.