Write to HDFS

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Write to HDFS

Uğur Sopaoğlu
Hi all,

In word count example,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
 counts.saveAsTextFile("hdfs://master:8020/user/abc")

I want to write collection of "counts" which is used in code above to HDFS, so

val x = counts.collect()

Actually I want to write x to HDFS. But spark wants to RDD to write sometihng to HDFS

How can I write Array[(String,Int)] to HDFS


--
Uğur
Reply | Threaded
Open this post in threaded view
|

Re: Write to HDFS

Marco Mistroni
Hi
 Could you just create an rdd/df out of what you want to save and store it in hdfs?
Hth

On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <[hidden email]> wrote:
Hi all,

In word count example,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
 counts.saveAsTextFile("hdfs://master:8020/user/abc")

I want to write collection of "counts" which is used in code above to HDFS, so

val x = counts.collect()

Actually I want to write x to HDFS. But spark wants to RDD to write sometihng to HDFS

How can I write Array[(String,Int)] to HDFS


--
Uğur
Reply | Threaded
Open this post in threaded view
|

Re: Write to HDFS

Uğur Sopaoğlu
Actually, when I run following code,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)


It save the results into more than one partition like part-00000, part-00001. I want to collect all of them into one file.


2017-10-20 16:43 GMT+03:00 Marco Mistroni <[hidden email]>:
Hi
 Could you just create an rdd/df out of what you want to save and store it in hdfs?
Hth

On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <[hidden email]> wrote:
Hi all,

In word count example,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
 counts.saveAsTextFile("hdfs://master:8020/user/abc")

I want to write collection of "counts" which is used in code above to HDFS, so

val x = counts.collect()

Actually I want to write x to HDFS. But spark wants to RDD to write sometihng to HDFS

How can I write Array[(String,Int)] to HDFS


--
Uğur



--
Uğur Sopaoğlu
Reply | Threaded
Open this post in threaded view
|

Re: Write to HDFS

Marco Mistroni
Use  counts.repartition(1).save......
Hth

On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" <[hidden email]> wrote:
Actually, when I run following code,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)


It save the results into more than one partition like part-00000, part-00001. I want to collect all of them into one file.


2017-10-20 16:43 GMT+03:00 Marco Mistroni <[hidden email]>:
Hi
 Could you just create an rdd/df out of what you want to save and store it in hdfs?
Hth

On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <[hidden email]> wrote:
Hi all,

In word count example,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
 counts.saveAsTextFile("hdfs://master:8020/user/abc")

I want to write collection of "counts" which is used in code above to HDFS, so

val x = counts.collect()

Actually I want to write x to HDFS. But spark wants to RDD to write sometihng to HDFS

How can I write Array[(String,Int)] to HDFS


--
Uğur



--
Uğur Sopaoğlu

Reply | Threaded
Open this post in threaded view
|

Re: Write to HDFS

Deepak Sharma
Better use coalesce instead of repatition

On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni <[hidden email]> wrote:
Use  counts.repartition(1).save......
Hth


On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" <[hidden email]> wrote:
Actually, when I run following code,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)


It save the results into more than one partition like part-00000, part-00001. I want to collect all of them into one file.


2017-10-20 16:43 GMT+03:00 Marco Mistroni <[hidden email]>:
Hi
 Could you just create an rdd/df out of what you want to save and store it in hdfs?
Hth

On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <[hidden email]> wrote:
Hi all,

In word count example,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
 counts.saveAsTextFile("hdfs://master:8020/user/abc")

I want to write collection of "counts" which is used in code above to HDFS, so

val x = counts.collect()

Actually I want to write x to HDFS. But spark wants to RDD to write sometihng to HDFS

How can I write Array[(String,Int)] to HDFS


--
Uğur



--
Uğur Sopaoğlu




--