Sharing ideas on using Databricks Delta Lake

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Sharing ideas on using Databricks Delta Lake

Mich Talebzadeh
I upgraded my Spark to 2.4.3 that allows using the storage layer Delta Lake . Actually, I wish Databricks would have chosen a different name for it :)

Anyhow although most example of storage are on normal file system, (/tmp/<TABLE>), I managed to put data on hdfs itself. I assume this should work on any Hadoop Compatible File System (HCFS) like GCP buckets etc?

According to the link above:

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

So in a nutshell with ACID compliance we have got an Oracle type DW on HDFS with snapshots. So I am thinking loud besides its compatibility with Spark (which is great), where I can use this product to give me strategic advantage?

Also how much functional programming this will support. I gather once you created  DataFrame on top of storage, windowing analytics etc can be used BAU.

I am sure someone can explain this.

Regards,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.