Spark / Scala code not recognising the path?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark / Scala code not recognising the path?

Abhijeet Kumar
I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("<a href="hdfs://localhost:8020/data/temp_insight" class="">hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar
Reply | Threaded
Open this post in threaded view
|

Re: Spark / Scala code not recognising the path?

Jörn Franke
You need some time until the information of the file creation is propagated.

On 9. Jun 2018, at 08:07, Abhijeet Kumar <[hidden email]> wrote:

I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("<a href="hdfs://localhost:8020/data/temp_insight" class="">hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar
Reply | Threaded
Open this post in threaded view
|

Re: Spark / Scala code not recognising the path?

Abhijeet Kumar
Can you please tell the estimated time. So, that my program will wait for that time period.

Thanks,
Abhijeet Kumar
On 09-Jun-2018, at 12:01 PM, Jörn Franke <[hidden email]> wrote:

You need some time until the information of the file creation is propagated.

On 9. Jun 2018, at 08:07, Abhijeet Kumar <[hidden email]> wrote:

I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("<a href="hdfs://localhost:8020/data/temp_insight" class="">hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar

Reply | Threaded
Open this post in threaded view
|

Re: Spark / Scala code not recognising the path?

Jörn Franke
That would be an anti pattern and would lead to bad software.
Please don’t do it for the sake of the people that use your software.
What do you exactly want to achieve with the information if the file exists or not?

On 9. Jun 2018, at 08:34, Abhijeet Kumar <[hidden email]> wrote:

Can you please tell the estimated time. So, that my program will wait for that time period.

Thanks,
Abhijeet Kumar
On 09-Jun-2018, at 12:01 PM, Jörn Franke <[hidden email]> wrote:

You need some time until the information of the file creation is propagated.

On 9. Jun 2018, at 08:07, Abhijeet Kumar <[hidden email]> wrote:

I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("<a href="hdfs://localhost:8020/data/temp_insight" class="">hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar

Reply | Threaded
Open this post in threaded view
|

Re: Spark / Scala code not recognising the path?

Abhijeet Kumar
I need to rename the file. I can write a separate program for this, I think.

Thanks,
Abhijeet Kumar 
On 09-Jun-2018, at 1:10 PM, Jörn Franke <[hidden email]> wrote:

That would be an anti pattern and would lead to bad software.
Please don’t do it for the sake of the people that use your software.
What do you exactly want to achieve with the information if the file exists or not?

On 9. Jun 2018, at 08:34, Abhijeet Kumar <[hidden email]> wrote:

Can you please tell the estimated time. So, that my program will wait for that time period.

Thanks,
Abhijeet Kumar
On 09-Jun-2018, at 12:01 PM, Jörn Franke <[hidden email]> wrote:

You need some time until the information of the file creation is propagated.

On 9. Jun 2018, at 08:07, Abhijeet Kumar <[hidden email]> wrote:

I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("<a href="hdfs://localhost:8020/data/temp_insight" class="">hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar


Reply | Threaded
Open this post in threaded view
|

Re: Spark / Scala code not recognising the path?

Jörn Franke
Why don’t you write the final name from the start?
Ie save as the file it should be named.

On 9. Jun 2018, at 09:44, Abhijeet Kumar <[hidden email]> wrote:

I need to rename the file. I can write a separate program for this, I think.

Thanks,
Abhijeet Kumar 
On 09-Jun-2018, at 1:10 PM, Jörn Franke <[hidden email]> wrote:

That would be an anti pattern and would lead to bad software.
Please don’t do it for the sake of the people that use your software.
What do you exactly want to achieve with the information if the file exists or not?

On 9. Jun 2018, at 08:34, Abhijeet Kumar <[hidden email]> wrote:

Can you please tell the estimated time. So, that my program will wait for that time period.

Thanks,
Abhijeet Kumar
On 09-Jun-2018, at 12:01 PM, Jörn Franke <[hidden email]> wrote:

You need some time until the information of the file creation is propagated.

On 9. Jun 2018, at 08:07, Abhijeet Kumar <[hidden email]> wrote:

I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("<a href="hdfs://localhost:8020/data/temp_insight" class="">hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar


Reply | Threaded
Open this post in threaded view
|

Re: Spark / Scala code not recognising the path?

Abhijeet Kumar
The situation is completely different than what you are thinking. Ok, thanks for your time. From now I'll figure this out myself. Thank you again!

On Sat, 9 Jun 2018, 13:27 Jörn Franke, <[hidden email]> wrote:
Why don’t you write the final name from the start?
Ie save as the file it should be named.

On 9. Jun 2018, at 09:44, Abhijeet Kumar <[hidden email]> wrote:

I need to rename the file. I can write a separate program for this, I think.

Thanks,
Abhijeet Kumar 
On 09-Jun-2018, at 1:10 PM, Jörn Franke <[hidden email]> wrote:

That would be an anti pattern and would lead to bad software.
Please don’t do it for the sake of the people that use your software.
What do you exactly want to achieve with the information if the file exists or not?

On 9. Jun 2018, at 08:34, Abhijeet Kumar <[hidden email]> wrote:

Can you please tell the estimated time. So, that my program will wait for that time period.

Thanks,
Abhijeet Kumar
On 09-Jun-2018, at 12:01 PM, Jörn Franke <[hidden email]> wrote:

You need some time until the information of the file creation is propagated.

On 9. Jun 2018, at 08:07, Abhijeet Kumar <[hidden email]> wrote:

I'm modifying a CSV file which is inside HDFS and finally putting it back to HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))

Output:

false

while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight

Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup          0 2018-06-08 17:48 /data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup        201 2018-06-08 17:48 /data/temp_insight/part-00000-7bffb826-f18d-4022-b089-da85565525b7-c000.csv

To cross verify whether it is taking the path of hdfs or not I have added one more println statement in my code, providing the path which is already there in HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar