RDD of binary files

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RDD of binary files

David Thomas
I have a set of binary files and I would like to create an RDD out of them and pipe them through an external process. So how do I create an RDD of such objects? For quick prototyping, can I do it without using HDFS?
Reply | Threaded
Open this post in threaded view
|

Re: RDD of binary files

MLnick
You should be able to use a custom Hadoop file:

sc.newAPIHadoopFile(...)
Use FileInputFormat with longWritable as the key class and BinaryWritable as the value class.

This will read the files from an input directory which can be a local file system for testing.

Take a look at the code for sc.textFile to see how it gets set up with the inputFormat and writable classes if you get stuck.


Sent from Mailbox for iPhone


On Tue, Feb 4, 2014 at 10:55 PM, David Thomas <[hidden email]> wrote:

I have a set of binary files and I would like to create an RDD out of them and pipe them through an external process. So how do I create an RDD of such objects? For quick prototyping, can I do it without using HDFS?