Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.
On Thu, 21 May 2020 at 16:15, Avadhut Narayan Joshi <[hidden email]> wrote:
I am working on ETL using Spark .
I am fetching streaming data from Confluent Kafka
Wanted to do aggregations by combining streaming data with Data from SQL Server
For achieving above use case
Can I fetch data from SQL Server into Spark based on where conditions ?
Can such data fetched from SQL Server combined with Streaming data and again streamed back into SQL Server ?
Is above use case valid ? Do we have any examples for above ?
Hi Avadhut Narayan Joshi
The use case is achievable using Spark.
Connection toSQL Server possible as Mich mentioned below as longs as there a JDBC driver
that can connect to SQL Server
For a production workloads important points to consider,
>> what is the QoS requirements for your case? at least once, at most once, exactly-once
>> how to handle Spark Streaming job restarts?
(because of error or you have to put a new version of application)
>> What are your error handling strategies?
>> How do you deal with late arriving data since you are doing aggregations?
It is best to make downstream systems idempotent, that is very less troublesome way to have maintainable
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/