ETL Using Spark

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ETL Using Spark

Avadhut Narayan Joshi

Hello Team

 

I am working on  ETL using Spark .

 

  • I am fetching streaming data from Confluent Kafka
  • Wanted to do aggregations by combining streaming data with Data from SQL Server

 

For achieving above use case

 

  1. Can I fetch data from SQL Server into Spark based on where conditions ?
  2. Can such data fetched from SQL Server combined with Streaming data and again streamed back into SQL Server ?

 

Is above use case valid ? Do we have any examples for above ?

 

Regards

Avadhut


Schlumberger-Private

Reply | Threaded
Open this post in threaded view
|

Re: ETL Using Spark

Mich Talebzadeh
Ok 
  1. What information are you fetching from MSSQL. Is this reference data?
  2. What information are you processing through Spark via topics?
  3. Assuming you are combining data from MSSQL and Spark and enriching it are you posting back to another table in the same database?

Specifically you can fetch data from MSSQL through JDBC connection. Also the enriched data can be written back to MSSQL through JDBC again


HTH




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Thu, 21 May 2020 at 16:15, Avadhut Narayan Joshi <[hidden email]> wrote:

Hello Team

 

I am working on  ETL using Spark .

 

  • I am fetching streaming data from Confluent Kafka
  • Wanted to do aggregations by combining streaming data with Data from SQL Server

 

For achieving above use case

 

  1. Can I fetch data from SQL Server into Spark based on where conditions ?
  2. Can such data fetched from SQL Server combined with Streaming data and again streamed back into SQL Server ?

 

Is above use case valid ? Do we have any examples for above ?

 

Regards

Avadhut


Schlumberger-Private

VP
Reply | Threaded
Open this post in threaded view
|

Re: ETL Using Spark

VP
This post was updated on .
Hi Avadhut Narayan Joshi The use case is achievable using Spark. Connection toSQL Server possible as Mich mentioned below as longs as there a JDBC driver that can connect to SQL Server For a production workloads important points to consider, >> what is the QoS requirements for your case? at least once, at most once, exactly-once >> how to handle Spark Streaming job restarts? (because of error or you have to put a new version of application) >> What are your error handling strategies? >> How do you deal with late arriving data since you are doing aggregations? It is best to make downstream systems idempotent, that is very less troublesome way to have maintainable production workloads Best Regards VP -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Best Regards, VP