Spark ML online serving

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark ML online serving

Holden Karau
At Spark Summit some folks were talking about model serving and we wanted to collect requirements from the community.
--
Reply | Threaded
Open this post in threaded view
|

Re: Spark ML online serving

Maximiliano Felice
Hi!

I know I'm late, but just to point some highlights of our usecase. We currently:

  • Use Spark as an ETL tool, followed by 
  • a Python (numpy/pandas based) pipeline to preprocess information and 
  • use Tensorflow for training our Neural Networks

What we'd love to, and why we don't:

  • Start using Spark for our full preprocessing pipeline. Because type safety. And distributed computation. And catalyst. Buy mainly because not-python.
    Our main issue:
    • We want to use the same code for online serving. We're not willing to duplicate the preprocessing operations. Spark is not serving-friendly
    • If we want it to preprocess online, we need to copy/paste our custom transformations to MLeap.
    • It's an issue to communicate with a Tensorflow API to give it the preprocessed data to serve.
  • Use Spark to do hyperparameter tunning. 
    We'd need:
    • GPU Integration with Spark, letting us achieve finer tuning.
    • Better TensorFlow integration

Would love to know other usecases, and if others relate to the same issues than us.

El mié., 6 jun. 2018 a las 21:10, Holden Karau (<[hidden email]>) escribió:
At Spark Summit some folks were talking about model serving and we wanted to collect requirements from the community.
--