This post has NOT been accepted by the mailing list yet.
I did not get there is a need to use docker in your data analysis work.
If it is simple for data analysis work, you just need to install a spark on your mac and load in the data from your local disk for you analysis purpose.
If you are to use Docker to simulate a multi-nodes cluster, first you may need to install Docker and then create 3-4 containers. Treating each container as a VM node and install Java, Spark on each node. In a distributed environment, you may also have to install HDFS as backend data storage.