Spark Streaming with Kafka and Python

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Streaming with Kafka and Python

Hamish Whittal
Hi folks,

Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK.

The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar

Now I'm wanting to read from Kafka topics using Python (I need to stick to Python specifically). 

What seems confusing is that 0.8 has Python support, but 0.10 does not. Then 0.8 seems to have been deprecated as of Spark 2.3.0, so if I'm using 2.4.5 then clearly I'm going to hit a roadblock here.

Can someone clarify these things for me? Have I got this right?

Thanks in advance,
Hamish
Reply | Threaded
Open this post in threaded view
|

Re: Spark Streaming with Kafka and Python

German Schiavon Matteo
Hey,

Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming?


Regards

On Wed, 12 Aug 2020 at 14:12, Hamish Whittal <[hidden email]> wrote:
Hi folks,

Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK.

The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar

Now I'm wanting to read from Kafka topics using Python (I need to stick to Python specifically). 

What seems confusing is that 0.8 has Python support, but 0.10 does not. Then 0.8 seems to have been deprecated as of Spark 2.3.0, so if I'm using 2.4.5 then clearly I'm going to hit a roadblock here.

Can someone clarify these things for me? Have I got this right?

Thanks in advance,
Hamish
Reply | Threaded
Open this post in threaded view
|

Re: Spark Streaming with Kafka and Python

srowen
What supports Python in (Kafka?) 0.8? I don't think Spark ever had a
specific Python-Kafka integration. But you have always been able to
use it to read DataFrames as in Structured Streaming.
Kafka 0.8 support is deprecated (gone in 3.0) but 0.10 means 0.10+ -
works with the latest 2.x.
What is the issue?

On Wed, Aug 12, 2020 at 7:53 AM German Schiavon
<[hidden email]> wrote:

>
> Hey,
>
> Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming?
>
> https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html
>
> Regards
>
> On Wed, 12 Aug 2020 at 14:12, Hamish Whittal <[hidden email]> wrote:
>>
>> Hi folks,
>>
>> Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK.
>>
>> The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar
>>
>> Now I'm wanting to read from Kafka topics using Python (I need to stick to Python specifically).
>>
>> What seems confusing is that 0.8 has Python support, but 0.10 does not. Then 0.8 seems to have been deprecated as of Spark 2.3.0, so if I'm using 2.4.5 then clearly I'm going to hit a roadblock here.
>>
>> Can someone clarify these things for me? Have I got this right?
>>
>> Thanks in advance,
>> Hamish

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]