Kafka Direct Stream - dynamic topic subscription

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Kafka Direct Stream - dynamic topic subscription

Ramanan, Buvana (Nokia - US/Murray Hill)

Hello,

 

Using Spark 2.2.0. Interested in seeing the action of dynamic topic subscription.

 

Tried this example: streaming.DirectKafkaWordCount (which uses org.apache.spark.streaming.kafka010)

 

I start with 8 Kafka partitions in my topic and found that Spark Streaming executes 8 tasks (one per partition), which is what is expected. While this example process was going on, I increased the Kafka partitions to 16 and started producing data to the new partitions as well.

 

I expected that the Kafka consumer that Spark uses, would detect this change and spawn new tasks for the new partitions. But I find that it only reads from the old partitions and does not read from new partitions. When I do a restart, it reads from all 16 partitions.

 

Is this expected?

 

What is meant by dynamic topic subscription?  

 

Does it apply only to topics with a name that matches a regular expression and it does not apply to dynamically growing partitions?

 

Thanks,

Buvana

 

Reply | Threaded
Open this post in threaded view
|

Re: FW: Kafka Direct Stream - dynamic topic subscription

Cody Koeninger
As it says in SPARK-10320 and in the docs at
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#consumerstrategies
, you can use SubscribePattern

On Sun, Oct 29, 2017 at 3:56 PM, Ramanan, Buvana (Nokia - US/Murray
Hill) <[hidden email]> wrote:

> Hello Cody,
>
>
>
> As the stake holders of JIRA SPARK-10320 issue, can you please explain the
> purpose of dynamic topic subscription? Does it mean adapting the consumer to
> read from the new partitions that might get created after the SparkStreaming
> job begins? Is there a succinct writeup on the dynamic topic subscription
> feature that you can share?
>
>
>
> Also, is there  a way I can subscribe to topics whose name matches a regular
> expression (some Kafka consumers such as kafka-python python library support
> that)?
>
>
>
> I forward the email I sent to spark users group that contains a little more
> background on my question.
>
>
>
> Thank you,
>
> Regards,
>
> Buvana
>
>
>
> From: Ramanan, Buvana (Nokia - US/Murray Hill)
> [mailto:[hidden email]]
> Sent: Friday, October 27, 2017 10:46 PM
> To: [hidden email]
> Subject: Kafka Direct Stream - dynamic topic subscription
>
>
>
> Hello,
>
>
>
> Using Spark 2.2.0. Interested in seeing the action of dynamic topic
> subscription.
>
>
>
> Tried this example: streaming.DirectKafkaWordCount (which uses
> org.apache.spark.streaming.kafka010)
>
>
>
> I start with 8 Kafka partitions in my topic and found that Spark Streaming
> executes 8 tasks (one per partition), which is what is expected. While this
> example process was going on, I increased the Kafka partitions to 16 and
> started producing data to the new partitions as well.
>
>
>
> I expected that the Kafka consumer that Spark uses, would detect this change
> and spawn new tasks for the new partitions. But I find that it only reads
> from the old partitions and does not read from new partitions. When I do a
> restart, it reads from all 16 partitions.
>
>
>
> Is this expected?
>
>
>
> What is meant by dynamic topic subscription?
>
>
>
> Does it apply only to topics with a name that matches a regular expression
> and it does not apply to dynamically growing partitions?
>
>
>
> Thanks,
>
> Buvana
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]