getting started with mllib.recommendation.ALS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

getting started with mllib.recommendation.ALS

Sandeep Parikh
Question on the input and output for ALS.train() and MatrixFactorizationModel.predict().

My input is list of Ratings(user_id, product_id, rating) and my ratings are one a scale of 1-5 (inclusive). When I compute predictions over the superset of all (user_id, product_id) pairs, the ratings produced are on a different scale.

The question is this: do I need to normalize the data coming out of predict() to my own scale or does the input need to be different?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: getting started with mllib.recommendation.ALS

sowen
For trainImplicit(), the output is an approximation of a matrix of 0s
and 1s, so the values are generally (not always) in [0,1]

But for train(), you should be predicting the original input matrix
as-is, as I understand. You should get output in about the same range
as the input but again not necessarily 1-5. If it's really different,
you could be underfitting. Try less lambda, more features?

On Tue, Jun 10, 2014 at 4:59 PM, Sandeep Parikh <[hidden email]> wrote:

> Question on the input and output for ALS.train() and
> MatrixFactorizationModel.predict().
>
> My input is list of Ratings(user_id, product_id, rating) and my ratings are
> one a scale of 1-5 (inclusive). When I compute predictions over the superset
> of all (user_id, product_id) pairs, the ratings produced are on a different
> scale.
>
> The question is this: do I need to normalize the data coming out of
> predict() to my own scale or does the input need to be different?
>
> Thanks!
>
Reply | Threaded
Open this post in threaded view
|

Re: getting started with mllib.recommendation.ALS

Sandeep Parikh
Thanks Sean. I realized that I was supplying train() with a very low rank so I will retry with something higher and then play with lambda as-needed.


On Tue, Jun 10, 2014 at 4:58 PM, Sean Owen <[hidden email]> wrote:
For trainImplicit(), the output is an approximation of a matrix of 0s
and 1s, so the values are generally (not always) in [0,1]

But for train(), you should be predicting the original input matrix
as-is, as I understand. You should get output in about the same range
as the input but again not necessarily 1-5. If it's really different,
you could be underfitting. Try less lambda, more features?

On Tue, Jun 10, 2014 at 4:59 PM, Sandeep Parikh <[hidden email]> wrote:
> Question on the input and output for ALS.train() and
> MatrixFactorizationModel.predict().
>
> My input is list of Ratings(user_id, product_id, rating) and my ratings are
> one a scale of 1-5 (inclusive). When I compute predictions over the superset
> of all (user_id, product_id) pairs, the ratings produced are on a different
> scale.
>
> The question is this: do I need to normalize the data coming out of
> predict() to my own scale or does the input need to be different?
>
> Thanks!
>