Spark on your Oracle Data Warehouse

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark on your Oracle Data Warehouse

hbutani
I have been developing 'Spark on Oracle', a project to provide better integration of Spark into an Oracle Data Warehouse. You can read about it at https://hbutani.github.io/spark-on-oracle/blog/Spark_on_Oracle_Blog.html

The key features are Catalog Integration, translation and pushdown of Spark SQL to Oracle SQL/PL-SQL, Language Integration and Runtime Integration.

These are provided as Spark extensions via a Catalog Plugin, v2 DataSource, Logical and Physical Planner Rules, Parser Extension, automatic Function Registration and Spark SQL Macros(a generic Spark capability we have developed).

The vision is to enable Oracle customers to deploy Spark Applications that take full advantage of the data and capabilities of their Oracle Data Warehouse; and also make Spark cluster operations simpler and unified with their existing Oracle warehouse operations.

Looking for suggestion, comments from the Spark community.

regards,
Harish Butani.
Reply | Threaded
Open this post in threaded view
|

Re: Spark on your Oracle Data Warehouse

Mich Talebzadeh
Hi,

I just posted some stuff regarding using Spark with Oracle, If you want to do distributed processing like any DW of your choice be Oracle , Hive or BigQuery, best in my experience to create Spark dataframes on top of the underlying storage.either through JDBC or Spark API (Hive or BigQuery).

Your mileage varies as usual.

HTH




   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Tue, 23 Mar 2021 at 15:52, Harish Butani <[hidden email]> wrote:
I have been developing 'Spark on Oracle', a project to provide better integration of Spark into an Oracle Data Warehouse. You can read about it at https://hbutani.github.io/spark-on-oracle/blog/Spark_on_Oracle_Blog.html

The key features are Catalog Integration, translation and pushdown of Spark SQL to Oracle SQL/PL-SQL, Language Integration and Runtime Integration.

These are provided as Spark extensions via a Catalog Plugin, v2 DataSource, Logical and Physical Planner Rules, Parser Extension, automatic Function Registration and Spark SQL Macros(a generic Spark capability we have developed).

The vision is to enable Oracle customers to deploy Spark Applications that take full advantage of the data and capabilities of their Oracle Data Warehouse; and also make Spark cluster operations simpler and unified with their existing Oracle warehouse operations.

Looking for suggestion, comments from the Spark community.

regards,
Harish Butani.