Unable to pickle pySpark PipelineModel

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Unable to pickle pySpark PipelineModel

Pralabh Kumar
Hi Dev , User

I want to store spark ml model in databases , so that I can reuse them later on .  I am 
unable to pickle them . However while using scala I am able to convert them into byte
array stream . 

So for .eg I am able to do something below in scala but not in python

 val modelToByteArray = new ByteArrayOutputStream()
 val oos = new ObjectOutputStream(modelToByteArray)
 oos.writeObject(model)
 oos.close()
 oos.flush()

spark.sparkContext.parallelize(Seq((model.uid, "my-neural-network-model", modelToByteArray.toByteArray)))
   .saveToCassandra("dfsdfs", "models", SomeColumns("uid", "name", "model")


But pickle.dumps(model) in pyspark throws error

cannot pickle '_thread.RLock' object


Please help on the same 


Regards

Pralabh