Basically the other post explains, that when you read from the same hdfs directory that you are writing to, and your SaveMode is "overwrite", then you will get a java.io.FileNotFoundException. But here I am finding that just moving where in the program the action is can give very different results - either completing the program or giving this exception. I was wondering if anyone can explain why Spark is not being consistent here?
val myDF = spark.read.format("csv")
// If I cache it here or persist it then do an action after the cache, it will occasionally
// not throw the error. This is when completely restarting the SparkSession so there is no
// risk of another user interfering on the same JVM.
// Below is just meant to be showing that we're are doing other "spark dataframe transformations",
// but different transformations have both led to the weird behavior so, I'm not being specific about
// what exactly the dataframe transformations are
val secondDF = mergeOtherDFsWithmyDF(myDF, otherDF, thirdDF)
val fourthDF = mergeTwoDFs(thirdDF, StringToCheck, fifthDF)
// Below is the same .show(1) action call as was previously done, only this below
// action ALWAYS results in a successful completion and the above .show(1) sometimes results
// in FileNotFoundException and sometimes results in successful completion. The only
// thing that changes among test runs is only one is executed. Either
// **fourthDF.show(1) or myDF.show(1) is left commented out**