Spark Deployment Strategy

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark Deployment Strategy

I want to understand how best to deploy spark close to a data source or sink.

Let's say, I have a vertica cluster that I need to run spark job on. In that
case how should spark cluster be setup?

1. Should we run a spark worker node on each vertica cluster node?
2. How about when shuffling plays out?
3. Also how would the deployment look like in a managed cluster deployement
such as kubernetes?

Sent from:

To unsubscribe e-mail: [hidden email]