something happened to MemoryStream after spark 2.3

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

something happened to MemoryStream after spark 2.3

Koert Kuipers
we just started testing internally with spark 2.4 snapshots, and it seems our streaming tests are broken.

i believe it has to do with MemoryStream.

before we were able to create a MemoryStream, add data to it, convert it to a streaming unbounded DataFrame and use it repeatedly. by using it repeatedly i mean repeatedly doing: create a query (with a random uuid name) from dataframe, process all available, stop the query. every time we did this all the data in the MemoryStream would be processed.

now with spark 2.4.0-SNAPSHOT the second time we create a query no data is processed at all. it is as if the MemoryStream is empty. it this expected? should we refactor our tests?