Separating storage from compute layer with Spark and data warehouses offering ML capabilities

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Separating storage from compute layer with Spark and data warehouses offering ML capabilities

Mich Talebzadeh
This is a generic question with regard to an optimum design.

Many Cloud Data Warehouses like Google BigQuery (BQ) or Oracle Autonomous Data Warehouse (ADW), nowadays offer ML capabilities based on models built within the storage itself. This is great as it allows those with SQL knowledge but not necessarily data scientists to build and run models. These data warehouses are built in Cloud.

However, I see some limitations if the data warehouse itself is used for both storage and model building capabilities. The fundamental issue arises when you want to scale this up with multiple sources (on prem or already in a data warehouse), concurrent users, ability to enrich data and the ability to store what is needed in the data warehouse itself (data or results of models).

This where Spark comes into play. It can connect multiple sources with JDBC connections, can combine data from these sources within Spark itself and provide in-memory enrichment and computation at the compute layer. additionally and perhaps more importantly you can scale up and down compute layers (some of them dedicated) to your needs without adversely impacting the storage and model building layer.

In summary, I cannot see how one can rely on storage layer alone to

  1. read data from multiple sources
  2. combine storage and computing with scale
  3. avoid concurrency bottlenecks in a meaningful way
I would be interested to hear other views on this.

Thanks

Mich


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.