AWS has 2 offerings built on top of Spark: EMR and Glue. You can, of course, spin up your EC2 instances and deploy Spark on it. The 3 offerings allows you to tradeoff between flexibility and infrastructure management. EC2 gives you the most flexibility, because it's basically a bunch of nodes, and you can configure spark anyway you want. Con is that you need to manage your EC2 instances. EMR is a step up: You manage your EC2 instances, but you don't need to manage Spark. With Glue, you don't need to manage infrastructure. Glue is serverless (for you)
Besides, those, you also get different choices. Like, if your usage is spiky, you could implement this in Kinesis. Or you could have your reporting application make queries to Athena
If you are currently using on-premisses then you should also consider Google Cloud platform (GCP). As a practitioner I see a number of customers migrating from others to GCP.
Databricks on GCP will be available (if I am correct) in April this year. GCP already offers Google Compute Engines as IaaS which support Spark with Yarn. In addition, you have other cost saving 'preemptible instances' that can run Spark on affordable tin boxes so to speak. GCP also offers BigQuery as a Data Warehouse (DW) with ML models built in. So there is a fair bit of 'either or choice' here. There is also the question of the migration path from GCP artifacts to Databricks. Will Databricks provide all these as a service? For example, BigQuery is a fully managed serverless warehouse. Will Lakehouse provide the same in GCP etc? BigQuery besides ML provides Oracle's PL/SQL type functions and procedures so some are migrating from Oracle classic on premises to BigQuery
However, neither BigQuery nor compute engines are cheap. Personally I believe the landscape on Cloud is getting congested and unless there is a clear motivation to move from one to another, many will choose to stay where they are. if you are already using Spark on a private Cloud, then the journey to GCP should be pretty smooth. As ever, your mileage will vary. You may also decide to go for a multi-cloud mixture with the best of breed.
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.