google cloud storage connector

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

google cloud storage connector

pushing_ice
This post has NOT been accepted by the mailing list yet.
I am trying to use Spark with Hadoop on the Google Compute Engine. They have a connector to Google Cloud Storage (think S3) described here: GCS Connector

Everything works fine, except I can't load a textFile using it in pyspark:
f = sc.textFile("gs://mybucketname")
f.take(1)

This throws "java.io.IOException: No FileSystem for scheme: gs". I emailed the Google Hadoop devs and they said the URI's have to start with "gs". Is support for this URI something that can easily be patched/added to Spark?