Reading sequencefile

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading sequencefile

Jaonary Rabarisoa
Hi all, 

I'm trying to read a sequenceFile that represent a set of jpeg image generated using this tool : http://stuartsierra.com/2008/04/24/a-million-little-files . According to the documentation : "Each key is the name of a file (a Hadoop “Text”), the value is the binary contents of the file (a BytesWritable)"

How do I load the generated file inside spark ?

Cheers,

Jaonary
Reply | Threaded
Open this post in threaded view
|

Re: Reading sequencefile

Shixiong Zhu
Hi Jaonary,

You can use "sc.sequenceFile" to load your file. E.g.,

scala> import org.apache.hadoop.io._
import org.apache.hadoop.io._

scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text], classOf[BytesWritable])
rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text, org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at <console>:15


Best Regards,

Shixiong Zhu


2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa <[hidden email]>:
Hi all, 

I'm trying to read a sequenceFile that represent a set of jpeg image generated using this tool : http://stuartsierra.com/2008/04/24/a-million-little-files . According to the documentation : "Each key is the name of a file (a Hadoop “Text”), the value is the binary contents of the file (a BytesWritable)"

How do I load the generated file inside spark ?

Cheers,

Jaonary

Reply | Threaded
Open this post in threaded view
|

Re: Reading sequencefile

Jaonary Rabarisoa
Thank you. I fogort the classOf[*] arguments.


On Tue, Mar 11, 2014 at 10:46 AM, Shixiong Zhu <[hidden email]> wrote:
Hi Jaonary,

You can use "sc.sequenceFile" to load your file. E.g.,

scala> import org.apache.hadoop.io._
import org.apache.hadoop.io._

scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text], classOf[BytesWritable])
rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text, org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at <console>:15


Best Regards,

Shixiong Zhu


2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa <[hidden email]>:

Hi all, 

I'm trying to read a sequenceFile that represent a set of jpeg image generated using this tool : http://stuartsierra.com/2008/04/24/a-million-little-files . According to the documentation : "Each key is the name of a file (a Hadoop “Text”), the value is the binary contents of the file (a BytesWritable)"

How do I load the generated file inside spark ?

Cheers,

Jaonary