[SPARK-STRUCTURED-STREAMING] IllegalStateException: Race while writing batch 4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[SPARK-STRUCTURED-STREAMING] IllegalStateException: Race while writing batch 4

Amit Joshi
Hi,

I have 2spark structure streaming queries writing to the same outpath in object storage.
Once in a while I am getting the "IllegalStateException: Race while writing batch 4".
I found that this error is because there are two writers writing to the output path. The file streaming sink doesn't support multiple writers. 
It assumes there is only one writer writing to the path. Each query needs to use its own output directory.

Is there a way to write the output to the same path by both queries, as I need the output at the same path.?

Regards
Amit Joshi
Reply | Threaded
Open this post in threaded view
|

Re: [SPARK-STRUCTURED-STREAMING] IllegalStateException: Race while writing batch 4

Jungtaek Lim-2
File stream sink doesn't support the functionality. There're several approaches to do so: 

1) two queries write to Kafka (or any intermediate storage which allows concurrent writes), and let next Spark application read and write to the final path
2) two queries write to two different directories, and let next Spark application read and write to the final path
3) use alternative data sources which enable concurrent writes on writing files (you may want to check Delta Lake, Apache Hudi, Apache Iceberg for such functionalities - though you'd probably need to learn many other things to maintain the table in good shape)

Thanks,
Jungtaek Lim (HeartSaVioR)

On Sat, Aug 8, 2020 at 4:19 AM Amit Joshi <[hidden email]> wrote:
Hi,

I have 2spark structure streaming queries writing to the same outpath in object storage.
Once in a while I am getting the "IllegalStateException: Race while writing batch 4".
I found that this error is because there are two writers writing to the output path. The file streaming sink doesn't support multiple writers. 
It assumes there is only one writer writing to the path. Each query needs to use its own output directory.

Is there a way to write the output to the same path by both queries, as I need the output at the same path.?

Regards
Amit Joshi