I want to understand how best to deploy spark close to a data source or sink.
Let's say, I have a vertica cluster that I need to run spark job on. In that
case how should spark cluster be setup?
1. Should we run a spark worker node on each vertica cluster node?
2. How about when shuffling plays out?
3. Also how would the deployment look like in a managed cluster deployement
such as kubernetes?