spark strucured csv file stream not detecting new files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

spark strucured csv file stream not detecting new files

Imran Rajjad
Greetings,
I am running a unit test designed to stream a folder where I am manually copying csv files. The files do not always get picked up. They only get detected when the job starts with the files already in the folder.

I even tried using the option of fileNameOnly newly included in 2.2.0. Have I missed something in the documentation. This problem does not seem to occur in DStreams examples


DataStreamReader reader =  spark.readStream().option("fileNameOnly", true).option("header",true)
    .schema(userSchema);
  ; 

Dataset<Row>csvDF= reader.csv(watchDir)

Dataset<Row> results = csvDF.groupBy("myCol").count();
MyForEach forEachObj=new MyForEach();
query = results
    .writeStream()
    .foreach(forEachObj) // for each never gets called
    .outputMode("complete")
    .start();

--
I.R