Data Lakes using Spark

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Data Lakes using Spark

Boris Litvak

Hi Friends,

 

I’d like to publish a document to Medium about data lakes using Spark.

Its latter parts include info that is not widely known, unless you have experience with data lakes.

 

https://github.com/borislitvak/datalake-article/blob/initial_comments/Building%20a%20Real%20Life%20Data%20Lake%20in%C2%A0AWS.md

I hope it’s OK if I ask you to review its draft.

 

You can respond here or contact me directly.

If there are some topics I should add (like, compaction effect on downstream reads using structured streaming), or there are errors, please point them out before it gets out.

Also, if some points are unclear or misleading, please state so.

 

Thanks,

Boris Litvak