Fwd: Train multiple machine learning models in parallel

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Train multiple machine learning models in parallel

Pola Yao

Hi Comminuty,

I have a 1T dataset which contains records for  50 users. Each user has 20G data averagely.

I wanted to use spark to train a machine learning model (e.g., XGBoost tree model) for each user. Ideally, the result should be 50 models. However, it'd be infeasible to submit 50 spark jobs through 'spark-submit'. 

The model parameters and feature engineering steps for each user's data would be exactly same, I am wondering if there is a way to train this 50 models in parallel?

Thanks!