We want to resize the Apache Spark Cluster on Amazon depending on the data load.
Currently we have set up our Apache Spark Cluster using spark-ec2.py script and we want to add or remove the worker nodes in the existing cluster depending on the data load.
Can we achieve this with modification in the spark-ec2.py script ?
Also using current spark-ec2.py script is it possible to give few worker node as spot instances and few as on demand ec2 instances? In current script it seems if we give some value to "--spot-price" option then all the worker nodes allocated as spot instances.
Meaning how can I achieve the load balancing in existing Spark cluster on Amazon ? Will StarCluster(http://star.mit.edu/cluster/) helpful in this case ?