structured streaming bookkeeping formats

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

structured streaming bookkeeping formats

Koert Kuipers
i was reading this blog post from last year about structured streaming run-once trigger:

its a nice idea to replace a batch job with structured streaming because it does the bookkeeping (whats new, failure recovery, etc.) for you.

but that's also the part that scares me a bit. when its all done for me and it breaks anyhow i am not sure i know how to recover. and i am unsure how to upgrade.
so... are the formats that spark structured streaming uses for "bookkeeping" easily readable (like say json) and stable? does it consist of files i can go look at and  understand and edit/manipulate myself if needed? are there are references to the format used?

thank you!