Bulk / Fast Read and Write with MSSQL Server and Spark

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Bulk / Fast Read and Write with MSSQL Server and Spark

Chetan Khatri
All,

I am looking for approach to do bulk read / write with MSSQL Server and Apache Spark 2.2 , please let me know if any library / driver for the same.

Thank you.
Chetan
Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

kedarsdixit
Hi,

I had came across  this
<https://stephanefrechette.com/connect-sql-server-using-apache-spark/#.WwVVosThXIU>  
a while ago check if this is helpful.

Regards,
~Kedar Dixit
Data Science @ Persistent Systems Ltd.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Silvio Fiorito
In reply to this post by Chetan Khatri

Try this https://docs.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector

 

 

From: Chetan Khatri <[hidden email]>
Date: Wednesday, May 23, 2018 at 7:47 AM
To: user <[hidden email]>
Subject: Bulk / Fast Read and Write with MSSQL Server and Spark

 

All,

 

I am looking for approach to do bulk read / write with MSSQL Server and Apache Spark 2.2 , please let me know if any library / driver for the same.

 

Thank you.

Chetan

Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Chetan Khatri
Thank you Kedar Dixit, Silvio Fiorito.

Just one question that - even it's not an azure cloud MS-SQL Server. It should support MS-SQL Server installed on local machine. right ?

Thank you.

On Wed, May 23, 2018 at 6:18 PM, Silvio Fiorito <[hidden email]> wrote:

Try this https://docs.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector

 

 

From: Chetan Khatri <[hidden email]>
Date: Wednesday, May 23, 2018 at 7:47 AM
To: user <[hidden email]>
Subject: Bulk / Fast Read and Write with MSSQL Server and Spark

 

All,

 

I am looking for approach to do bulk read / write with MSSQL Server and Apache Spark 2.2 , please let me know if any library / driver for the same.

 

Thank you.

Chetan


Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

kedarsdixit
Yes.

Regards,
Kedar Dixit



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Chetan Khatri
Super, just giving high level idea what i want to do. I have one source schema which is MS SQL Server 2008 and target is also MS SQL Server 2008. Currently there is c# based ETL application which does extract transform and load as customer specific schema including indexing etc.


Thanks

On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <[hidden email]> wrote:
Yes.

Regards,
Kedar Dixit



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Ajay-2
Do you worry about spark overloading the SQL server?  We have had this issue in the past where all spark slaves tend to send lots of data at once to SQL and that slows down the latency of the rest of the system. We overcame this by using sqoop and running it in a controlled environment. 

On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <[hidden email]> wrote:
Super, just giving high level idea what i want to do. I have one source schema which is MS SQL Server 2008 and target is also MS SQL Server 2008. Currently there is c# based ETL application which does extract transform and load as customer specific schema including indexing etc.


Thanks

On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <[hidden email]> wrote:
Yes.

Regards,
Kedar Dixit



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Thanks,
Ajay
Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

ayan guha
Curious question: what is the reason of using spark here? Why not simple sql-based ETL?

On Thu, May 24, 2018 at 5:09 AM, Ajay <[hidden email]> wrote:
Do you worry about spark overloading the SQL server?  We have had this issue in the past where all spark slaves tend to send lots of data at once to SQL and that slows down the latency of the rest of the system. We overcame this by using sqoop and running it in a controlled environment. 

On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <[hidden email]> wrote:
Super, just giving high level idea what i want to do. I have one source schema which is MS SQL Server 2008 and target is also MS SQL Server 2008. Currently there is c# based ETL application which does extract transform and load as customer specific schema including indexing etc.


Thanks

On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <[hidden email]> wrote:
Yes.

Regards,
Kedar Dixit



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Thanks,
Ajay



--
Best Regards,
Ayan Guha
Reply | Threaded
Open this post in threaded view
|

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Chetan Khatri
Ajay, You can use Sqoop if wants to ingest data to HDFS. This is POC where customer wants to prove that Spark ETL would be faster than C# based raw SQL Statements. That's all, There are no time-stamp based columns in Source tables to make it incremental load.

On Thu, May 24, 2018 at 1:08 AM, ayan guha <[hidden email]> wrote:
Curious question: what is the reason of using spark here? Why not simple sql-based ETL?

On Thu, May 24, 2018 at 5:09 AM, Ajay <[hidden email]> wrote:
Do you worry about spark overloading the SQL server?  We have had this issue in the past where all spark slaves tend to send lots of data at once to SQL and that slows down the latency of the rest of the system. We overcame this by using sqoop and running it in a controlled environment. 

On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <[hidden email]> wrote:
Super, just giving high level idea what i want to do. I have one source schema which is MS SQL Server 2008 and target is also MS SQL Server 2008. Currently there is c# based ETL application which does extract transform and load as customer specific schema including indexing etc.


Thanks

On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <[hidden email]> wrote:
Yes.

Regards,
Kedar Dixit



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Thanks,
Ajay



--
Best Regards,
Ayan Guha