Lazyoutput format in spark

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Lazyoutput format in spark

Mohit Singh
Hi,
  Is there something equivalent of LazyOutputFormat equivalent in spark (pyspark)
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html
Basically, something like where I only save files which has some data in it rather than saving all the files as in some cases, your majority of files can be empty?
Thanks

--
Mohit

"When you want success as badly as you want the air, then you will get it. There is no other secret of success."
-Socrates
Reply | Threaded
Open this post in threaded view
|

Re: Lazyoutput format in spark

Matei Zaharia
Administrator
You can probably use LazyOutputFormat directly. If there’s one for the hadoop.mapred API, you can use it with PairRDDFunctions.saveAsHadoopRDD() today, otherwise there’s going to be a version of that for the hadoop.mapreduce API as well in Spark 1.0.

Matei

On Feb 28, 2014, at 5:18 PM, Mohit Singh <[hidden email]> wrote:

Hi,
  Is there something equivalent of LazyOutputFormat equivalent in spark (pyspark)
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html
Basically, something like where I only save files which has some data in it rather than saving all the files as in some cases, your majority of files can be empty?
Thanks

--
Mohit

"When you want success as badly as you want the air, then you will get it. There is no other secret of success."
-Socrates