Structured Streaming : Custom Source and Sink Development and PySpark.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Structured Streaming : Custom Source and Sink Development and PySpark.

Ramaswamy, Muthuraman

I would like to develop Custom Source and Sink. So, I have a couple of questions:

 

  1. Do I have to use Scala or Java to develop these Custom Source/Sink?

 

  1. Also, once the source/sink has been developed, to use in PySpark/Python, do I have to develop any Py4J modules? Any pointers or good documentation or GitHub Source as a reference will be of great help.

 

Please advise.

 

Thank you,

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Structured Streaming : Custom Source and Sink Development and PySpark.

Russell Spitzer
Yes, Scala or Java. 

No. Once you have written the implementation it is valid in all df apis.

As for examples there are many, check the Kafka source in tree or one of the many sources listed on the spark packages website. 

On Thu, Aug 30, 2018, 8:23 PM Ramaswamy, Muthuraman <[hidden email]> wrote:

I would like to develop Custom Source and Sink. So, I have a couple of questions:

 

  1. Do I have to use Scala or Java to develop these Custom Source/Sink?

 

  1. Also, once the source/sink has been developed, to use in PySpark/Python, do I have to develop any Py4J modules? Any pointers or good documentation or GitHub Source as a reference will be of great help.

 

Please advise.

 

Thank you,