Reading BigQuery data from Spark in Google Dataproc

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Reading BigQuery data from Spark in Google Dataproc

Mich Talebzadeh
Hi,

I have testest few JDBC BigQuery providers like Progress Direct and Simba but none of them seem to work properly through Spark.

The only way I can read and write to BigQuery is through Spark BigQuery API using the following scenario

spark-shell --jars=gs://spark-lib/bigquery/spark-bigquery-latest.jar


Using the following JDBC connection to read


val BQDF = spark.read.

    format("bigquery").

    option("credentialsFile",jsonKeyFile).

    option("project", projectId).

    option("parentProject", projectId).

    option("dataset", targetDataset).

    option("table", targetTable).

    option("partitionColumn", partitionColumn).

    option("lowerBound", lowerBound).

    option("upperBound", upperBound).

    option("numPartitions", numPartitions).

    load()


and for write

    rsBatch.
      write.
      format("bigquery").
      mode(org.apache.spark.sql.SaveMode.Append).
      option("table", fullyQualifiedOutputTableId).
      save()

Appreciate any comments if someone has managed to make this work through any third party JDBC drivers.

Regards,

Mich



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.