Spark Image resizing

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Image resizing

Nick Dawes
Hi 

I'm new to spark image data source. 

After creating a dataframe using Spark's image data source, I would like to resize the images in PySpark. 

df = spark.read.format("image").load(imageDir)

Can you please help me with this?

Nick
Reply | Threaded
Open this post in threaded view
|

Re: Spark Image resizing

Nick Dawes
Any other way of resizing the image before creating the DataFrame in Spark? I know opencv does it. But I don't have opencv on my cluster. I have Anaconda python packages installed on my cluster. 

Any ideas will be appreciated.  Thank you!

On Tue, Jul 30, 2019, 4:17 PM Nick Dawes <[hidden email]> wrote:
Hi 

I'm new to spark image data source. 

After creating a dataframe using Spark's image data source, I would like to resize the images in PySpark. 

df = spark.read.format("image").load(imageDir)

Can you please help me with this?

Nick
Reply | Threaded
Open this post in threaded view
|

Re: Spark Image resizing

Patrick McCarthy-2
It won't be very efficient but you could write a python UDF using PythonMagick - https://wiki.python.org/moin/ImageMagick

If you have PyArrow > 0.10 then you might be able to get a boost by saving images in a column as BinaryType and writing a PandasUDF.

On Wed, Jul 31, 2019 at 6:22 AM Nick Dawes <[hidden email]> wrote:
Any other way of resizing the image before creating the DataFrame in Spark? I know opencv does it. But I don't have opencv on my cluster. I have Anaconda python packages installed on my cluster. 

Any ideas will be appreciated.  Thank you!

On Tue, Jul 30, 2019, 4:17 PM Nick Dawes <[hidden email]> wrote:
Hi 

I'm new to spark image data source. 

After creating a dataframe using Spark's image data source, I would like to resize the images in PySpark. 

df = spark.read.format("image").load(imageDir)

Can you please help me with this?

Nick


--

Patrick McCarthy 

Senior Data Scientist, Machine Learning Engineering

Dstillery

470 Park Ave South, 17th Floor, NYC 10016