Offset Management in Spark

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Offset Management in Spark

Siva Samraj
Hi all,

I am using Spark Structured Streaming (Version 2.3.2). I need to read from Kafka Cluster and write into Kerberized Kafka. 
Here I want to use Kafka as offset checkpointing after the record is written into Kerberized Kafka.

Questions: 

1. Can we use Kafka for checkpointing to manage offset or do we need to use only HDFS/S3 only? 

Please help.

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Offset Management in Spark

Gabor Somogyi
Hi,

Structured Streaming stores offsets only in HDFS compatible filesystems. Kafka and S3 are not such.
Custom offset storage was only an option in DStreams.

G


On Wed, Sep 30, 2020 at 9:45 AM Siva Samraj <[hidden email]> wrote:
Hi all,

I am using Spark Structured Streaming (Version 2.3.2). I need to read from Kafka Cluster and write into Kerberized Kafka. 
Here I want to use Kafka as offset checkpointing after the record is written into Kerberized Kafka.

Questions: 

1. Can we use Kafka for checkpointing to manage offset or do we need to use only HDFS/S3 only? 

Please help.

Thanks