structured streaming Kafka consumer group.id override

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

structured streaming Kafka consumer group.id override

Srinivas V
Hello,
1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this?
2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the offsets get expired after 4 hours. I read that when restarting to consume messages SS always tries to get the earliest offset but not latest offset. How to handle this problem?

Regards
Srini
Reply | Threaded
Open this post in threaded view
|

Re: structured streaming Kafka consumer group.id override

shicheng31604@gmail.com
1.Maybe  we can't use customized group  id in structured streaming.
2.When restarting from failure or killing , the group id changes, but the starting offset  will be the last one you consumed last time .   

Srinivas V <[hidden email]> 于2020年3月19日周四 下午12:36写道:
Hello,
1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this?
2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the offsets get expired after 4 hours. I read that when restarting to consume messages SS always tries to get the earliest offset but not latest offset. How to handle this problem?

Regards
Srini
Reply | Threaded
Open this post in threaded view
|

Re: structured streaming Kafka consumer group.id override

Srinivas V
1. How would a prod admin user/other engineers understand  which process is this random groupid which is consuming a specific  topic? why is it designed this way?
2. I don't see the groupid changing all the time. It is repeating on restarts. Not able to understand when and how it changes. I know it is trying to get the next offset from last consumed, but it is failing as the offset has been expired. What is the solution for this?

On Thu, Mar 19, 2020 at 10:53 AM lec ssmi <[hidden email]> wrote:
1.Maybe  we can't use customized group  id in structured streaming.
2.When restarting from failure or killing , the group id changes, but the starting offset  will be the last one you consumed last time .   

Srinivas V <[hidden email]> 于2020年3月19日周四 下午12:36写道:
Hello,
1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this?
2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the offsets get expired after 4 hours. I read that when restarting to consume messages SS always tries to get the earliest offset but not latest offset. How to handle this problem?

Regards
Srini
Reply | Threaded
Open this post in threaded view
|

Re: structured streaming Kafka consumer group.id override

shicheng31604@gmail.com
The last offset is stored  in file system you  specified , how does it expire? I don't understand. I haven't  met that condition.

Srinivas V <[hidden email]> 于2020年3月19日周四 下午10:18写道:
1. How would a prod admin user/other engineers understand  which process is this random groupid which is consuming a specific  topic? why is it designed this way?
2. I don't see the groupid changing all the time. It is repeating on restarts. Not able to understand when and how it changes. I know it is trying to get the next offset from last consumed, but it is failing as the offset has been expired. What is the solution for this?

On Thu, Mar 19, 2020 at 10:53 AM lec ssmi <[hidden email]> wrote:
1.Maybe  we can't use customized group  id in structured streaming.
2.When restarting from failure or killing , the group id changes, but the starting offset  will be the last one you consumed last time .   

Srinivas V <[hidden email]> 于2020年3月19日周四 下午12:36写道:
Hello,
1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this?
2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the offsets get expired after 4 hours. I read that when restarting to consume messages SS always tries to get the earliest offset but not latest offset. How to handle this problem?

Regards
Srini