hadoop files in Python

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

hadoop files in Python

Diana Carroll
Hello!  I'm exploring using custom input formats, which it seems I can do in Scala using sc.hadoopNewAPIFile or sc.hadoopNewAPIRDD.

My question is: is it possible to do this in Python?  The Python API doesn't have (AFAICT) the sc.hadoop* functions.

Thanks,
Diana
Reply | Threaded
Open this post in threaded view
|

Re: hadoop files in Python

Josh Rosen
There's an open pull request to add support for additional Hadoop file formats to PySpark: https://github.com/apache/incubator-spark/pull/263


On Thu, Jan 9, 2014 at 8:15 AM, Diana Carroll <[hidden email]> wrote:
Hello!  I'm exploring using custom input formats, which it seems I can do in Scala using sc.hadoopNewAPIFile or sc.hadoopNewAPIRDD.

My question is: is it possible to do this in Python?  The Python API doesn't have (AFAICT) the sc.hadoop* functions.

Thanks,
Diana