How can I use Google Cloud Datastore in Spark?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How can I use Google Cloud Datastore in Spark?

Yu Ishikawa
This post has NOT been accepted by the mailing list yet.
Hi all,

I am implementing a Spark Streaming application which connects to Google Cloud Datastore. But there seems to be a conflict of depended protobuf-java versions. I got the below error in a Spark cluster on Google Dataproc, when I got a entity from Google Cloud Datastore.

* Spark: 2.0.2
* Google Dataproc image version: 1.1
* com.google.cloud ยป google-cloud-datastore: 0.9.3-beta

As you know, Spark 2.0 depends on protobuf-java 2.5. On the other hand, Google Cloud Datastore SDK depends on protobuf-java 3.0.x. I guess the difference causes the issue.

How can I resolve the issue?

Best,

```
Caused by: java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class com.google.datastore.v1.LookupRequest
        at com.google.datastore.v1.LookupRequest.getSerializedSize(LookupRequest.java:260)
        at com.google.api.client.http.protobuf.ProtoHttpContent.getLength(ProtoHttpContent.java:65)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:914)
        at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:87)
        at com.google.datastore.v1.client.Datastore.lookup(Datastore.java:92)
        at com.google.cloud.datastore.spi.DefaultDatastoreRpc.lookup(DefaultDatastoreRpc.java:144)
        at com.google.cloud.datastore.DatastoreImpl$3.call(DatastoreImpl.java:289)
        at com.google.cloud.datastore.DatastoreImpl$3.call(DatastoreImpl.java:285)
        at com.google.cloud.RetryHelper.doRetry(RetryHelper.java:179)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:244)
        at com.google.cloud.datastore.DatastoreImpl.lookup(DatastoreImpl.java:284)
        at com.google.cloud.datastore.DatastoreImpl$ResultsIterator.loadResults(DatastoreImpl.java:260)
        at com.google.cloud.datastore.DatastoreImpl$ResultsIterator.<init>(DatastoreImpl.java:256)
        at com.google.cloud.datastore.DatastoreImpl.get(DatastoreImpl.java:246)
        at com.google.cloud.datastore.DatastoreImpl.get(DatastoreImpl.java:210)
        at com.google.cloud.datastore.DatastoreHelper.fetch(DatastoreHelper.java:75)
        at com.google.cloud.datastore.DatastoreImpl.fetch(DatastoreImpl.java:226)
```
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How can I use Google Cloud Datastore in Spark?

Yu Ishikawa
This post has NOT been accepted by the mailing list yet.
I was able to resolve the issue using SBT shading based on the following link.
Thank ywilkof for the excellent blog article!

http://www.yonatanwilkof.net/spark-dependency-conflict-jackson-sbt-shade-plugin/
Loading...