saveAsNewAPIHadoopFile and Relative Paths on Mesos

saveAsNewAPIHadoopFile and Relative Paths on Mesos

Adam Novak

When running with Spark 0.9.0 against Mesos, I can't use
saveAsNewAPIHadoopFile to save to a relative path (i.e. on the local
filesystem, relative to the master process's current working
directory). I'm writing in Parquet, so I see that no .parquet files
end up in that directory, and I get an error about the footer not
getting written (presumably since none of the data files were

Relative paths work when running Spark against local or local[10], and
absolute paths on the local filesystem work when running on Mesos. And
both relative and absolute paths work perfectly fine for reading from
the master's filesystem with newAPIHadoopFile.

I think the issue here is that the workers are evaluating the relative
path relative to whatever *their* current directory happens to be,
which, since Mesos runs them, isn't necessarily the same as that of
the master process. Since the worker nodes have the filesystem I am
working on mounted at the same location as the master does, an
absolute path works to get the data to the same place from both worker
and master nodes.

I think Spark should handle the conversion of relative paths on the
master to absolute paths that the workers can use no matter what their
working directories happen to be.