This post has NOT been accepted by the mailing list yet.
We have a data-critical application that uses spark streaming with kinesis. For our production environment, we now have a requirement for high uptimes, for application to be operational 24-7.
As a result, we need to be able to recover from all kinds of failures in the system. One edge case is if we have a bad day and our main spark driver/engine becomes unresponsive, crashes and we are unable to restart the system within the permissible kinesis data retention period (24hrs by default). Kinesis data may no longer be accessible after the retention period has elapsed. Increasing the retention period could be an option but not one desirable because of its cost implication and if we can't still recover in time.
Has anyone had the dilemma? Any solution/approach to this problem will be highly appreciated.