Failure Threshold in Spark Structured Streaming?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Failure Threshold in Spark Structured Streaming?

Eric Beabes
Currently my job fails even on a single failure. In other words, even if one incoming message is malformed the job fails. I believe there's a property that allows us to set an acceptable number of failures. I Googled but couldn't find the answer. Can someone please help? Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Failure Threshold in Spark Structured Streaming?

Jungtaek Lim-2
Structured Streaming is basically following SQL semantic, which doesn't have such a semantic of "max allowance of failures". If you'd like to tolerate malformed data, please read with raw format (string or binary) which won't fail with such data, and try converting. e.g. from_json() will produce null if the data is malformed, so you can filter out later easily.


On Fri, Jul 3, 2020 at 1:24 AM Eric Beabes <[hidden email]> wrote:
Currently my job fails even on a single failure. In other words, even if one incoming message is malformed the job fails. I believe there's a property that allows us to set an acceptable number of failures. I Googled but couldn't find the answer. Can someone please help? Thanks.