Re: read image or binary files / spark 2.3

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: read image or binary files / spark 2.3

Peter Liu
Hello experts,

I have quick question: which API allows me to read images files or binary files (for SparkSession.readStream) from a local/hadoop file system in Spark 2.3?

I have been browsing the following documentations and googling for it and didn't find a good example/documentation:


any hint/help would be very much appreciated!

thanks!

Peter
Reply | Threaded
Open this post in threaded view
|

Re: read binary files (for stream reader) / spark 2.3

Peter Liu
Hello experts,

I have one additional question: how can I read binary files into a stream reader object? (intended for getting data into a kafka server).

I looked into DataStreamReader API (https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-DataStreamReader.html#option) and other google results and didn't find an option for binary file.

Any help would be very much appreciated!
(thanks again for Ilya's helpful information below - works fine on sparkContext object)

Regards,

Peter


On Thu, Sep 5, 2019 at 3:09 PM Ilya Matiach <[hidden email]> wrote:

Hi Peter,

You can use the spark.readImages API in spark 2.3 for reading images:

 

https://databricks.com/blog/2018/12/10/introducing-built-in-image-data-source-in-apache-spark-2-4.html

https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-data-support-in-apache-spark/

 

https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.ml.image.ImageSchema$

 

There’s also a spark package for spark versions older than 2.3:

https://github.com/Microsoft/spark-images

 

Thank you, Ilya

 

 

 

 

From: Peter Liu <[hidden email]>
Sent: Thursday, September 5, 2019 2:13 PM
To: dev <[hidden email]>; User <[hidden email]>
Subject: Re: read image or binary files / spark 2.3

 

Hello experts,

 

I have quick question: which API allows me to read images files or binary files (for SparkSession.readStream) from a local/hadoop file system in Spark 2.3?

 

I have been browsing the following documentations and googling for it and didn't find a good example/documentation:

 

 

any hint/help would be very much appreciated!

 

thanks!

 

Peter