I have trained a ML model, now what?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

I have trained a ML model, now what?

Riccardo Ferrari
Hi list!

I am writing here to here about your experience on putting Spark ML models into production at scale.

I know it is a very broad topic with many different faces depending on the use-case, requirements, user base and whatever is involved in the task. Still I'd like to open a thread about this topic that is as important as properly training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be described as:
  • Offline/Batch: Load a model, performs the inference, store the results in some datasotre (DB, indexes,...)
  • Spark in the loop: Having a long running Spark context exposed in some way, this include streaming as well as some custom application that wraps the context.
  • Use a different technology to load the Spark MLlib model and run the inference pipeline. I have read about MLeap and other PMML based solutions.
I would love to hear about opensource solutions and possibly without requiring cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so what would you pick? Why? and how?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: I have trained a ML model, now what?

Felix Cheung
About deployment/serving

SPIP
https://issues.apache.org/jira/browse/SPARK-26247

 

From: Riccardo Ferrari <[hidden email]>
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?
 
Hi list!

I am writing here to here about your experience on putting Spark ML models into production at scale.

I know it is a very broad topic with many different faces depending on the use-case, requirements, user base and whatever is involved in the task. Still I'd like to open a thread about this topic that is as important as properly training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be described as:
  • Offline/Batch: Load a model, performs the inference, store the results in some datasotre (DB, indexes,...)
  • Spark in the loop: Having a long running Spark context exposed in some way, this include streaming as well as some custom application that wraps the context.
  • Use a different technology to load the Spark MLlib model and run the inference pipeline. I have read about MLeap and other PMML based solutions.
I would love to hear about opensource solutions and possibly without requiring cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so what would you pick? Why? and how?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: I have trained a ML model, now what?

Riccardo Ferrari
Felix, thank you very much for the link. Much appreciated.

The attached PDF is very interesting, I found myself evaluating many of the scenarios described in Q3. It's unfortunate the proposal is not being worked on, would be great to see that part of the code base.

It is cool to see big players like Uber trying to make Open Source better, thanks!


On Tue, Jan 22, 2019 at 5:24 PM Felix Cheung <[hidden email]> wrote:
 

From: Riccardo Ferrari <[hidden email]>
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?
 
Hi list!

I am writing here to here about your experience on putting Spark ML models into production at scale.

I know it is a very broad topic with many different faces depending on the use-case, requirements, user base and whatever is involved in the task. Still I'd like to open a thread about this topic that is as important as properly training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be described as:
  • Offline/Batch: Load a model, performs the inference, store the results in some datasotre (DB, indexes,...)
  • Spark in the loop: Having a long running Spark context exposed in some way, this include streaming as well as some custom application that wraps the context.
  • Use a different technology to load the Spark MLlib model and run the inference pipeline. I have read about MLeap and other PMML based solutions.
I would love to hear about opensource solutions and possibly without requiring cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so what would you pick? Why? and how?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: I have trained a ML model, now what?

Pola Yao
Hi Riccardo,

Right now, Spark does not support low-latency predictions in Production. MLeap is an alternative and it's been used in many scenarios. But it's good to see that Spark Community has decided to provide such support. 

On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari <[hidden email]> wrote:
Felix, thank you very much for the link. Much appreciated.

The attached PDF is very interesting, I found myself evaluating many of the scenarios described in Q3. It's unfortunate the proposal is not being worked on, would be great to see that part of the code base.

It is cool to see big players like Uber trying to make Open Source better, thanks!


On Tue, Jan 22, 2019 at 5:24 PM Felix Cheung <[hidden email]> wrote:
 

From: Riccardo Ferrari <[hidden email]>
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?
 
Hi list!

I am writing here to here about your experience on putting Spark ML models into production at scale.

I know it is a very broad topic with many different faces depending on the use-case, requirements, user base and whatever is involved in the task. Still I'd like to open a thread about this topic that is as important as properly training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be described as:
  • Offline/Batch: Load a model, performs the inference, store the results in some datasotre (DB, indexes,...)
  • Spark in the loop: Having a long running Spark context exposed in some way, this include streaming as well as some custom application that wraps the context.
  • Use a different technology to load the Spark MLlib model and run the inference pipeline. I have read about MLeap and other PMML based solutions.
I would love to hear about opensource solutions and possibly without requiring cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so what would you pick? Why? and how?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: I have trained a ML model, now what?

Felix Cheung
Please comment in the JIRA/SPIP if you are interested! We can see the community support for a proposal like this.

 

From: Pola Yao <[hidden email]>
Sent: Wednesday, January 23, 2019 8:01 AM
To: Riccardo Ferrari
Cc: Felix Cheung; User
Subject: Re: I have trained a ML model, now what?
 
Hi Riccardo,

Right now, Spark does not support low-latency predictions in Production. MLeap is an alternative and it's been used in many scenarios. But it's good to see that Spark Community has decided to provide such support. 

On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari <[hidden email]> wrote:
Felix, thank you very much for the link. Much appreciated.

The attached PDF is very interesting, I found myself evaluating many of the scenarios described in Q3. It's unfortunate the proposal is not being worked on, would be great to see that part of the code base.

It is cool to see big players like Uber trying to make Open Source better, thanks!


On Tue, Jan 22, 2019 at 5:24 PM Felix Cheung <[hidden email]> wrote:
 

From: Riccardo Ferrari <[hidden email]>
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?
 
Hi list!

I am writing here to here about your experience on putting Spark ML models into production at scale.

I know it is a very broad topic with many different faces depending on the use-case, requirements, user base and whatever is involved in the task. Still I'd like to open a thread about this topic that is as important as properly training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be described as:
  • Offline/Batch: Load a model, performs the inference, store the results in some datasotre (DB, indexes,...)
  • Spark in the loop: Having a long running Spark context exposed in some way, this include streaming as well as some custom application that wraps the context.
  • Use a different technology to load the Spark MLlib model and run the inference pipeline. I have read about MLeap and other PMML based solutions.
I would love to hear about opensource solutions and possibly without requiring cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so what would you pick? Why? and how?

Thanks!