Spark dataset to byte array over grpc

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark dataset to byte array over grpc

Ashwin Sai Shankar
Hi!
I'm building a spark app which runs a spark-sql query and send results to client over grpc(my proto file is configured to send the sql output as "bytes"). The client then displays the output rows. When I run spark.sql, I get a DataSet<Rows>. How do I convert this to byte array? 
Also is there a better way to send this output to client?

Thanks,
Ashwin

Reply | Threaded
Open this post in threaded view
|

Re: Spark dataset to byte array over grpc

Bryan Cutler
Hi Ashwin,

This sounds like it might be a good use for Apache Arrow, if you are open to the type of format to exchange.  As of Spark 2.3, Dataset has a method "toArrowPayload" that will convert a Dataset of Rows to a byte array in Arrow format, although the API is currently not public.  Your client could consume Arrow data directly or perhaps use spark.sql ColumnarBatch to read back as Rows.

Bryan

On Mon, Apr 23, 2018 at 11:49 AM, Ashwin Sai Shankar <[hidden email]> wrote:
Hi!
I'm building a spark app which runs a spark-sql query and send results to client over grpc(my proto file is configured to send the sql output as "bytes"). The client then displays the output rows. When I run spark.sql, I get a DataSet<Rows>. How do I convert this to byte array? 
Also is there a better way to send this output to client?

Thanks,
Ashwin