implicit ALS dataSet

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

implicit ALS dataSet

Hao Ren
Hi,

According to the paper on which MLlib's ALS is based, the model should take all user-item preferences
as an input, including those which are not related to any input observation (zero preference).

My question is:

With all positive observations in hand (similar to explicit feedback data set), should I generate all negative observations in order to make implicit ALS work with the complete data set (pos union neg) ?

Actually, we test on some data set like:

| user | item | nbPurchase |

nbPurchase is non zero, so we have no negative observations. What we did is generating all possible user-item with zero nbPurchase to have all possible user-item pair, but this operation takes some time and storage.

I just want to make sure whether we have to do that with MLlib's ALS ? or it has already done that ? In that case, I could simply pass only the positive observation as the explicit ALS does.

Hao.
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

sowen
The paper definitely does not suggest that you should include every
user-item pair in the input. The input is by nature extremely sparse,
so literally filling in all the 0s in the input would create
overwhelmingly large input. No, there is no need to do it and it would
be terrible for performance.

As far as I can see, the implementation would correctly handle an
input of 0 and the result would be as if it had not been included at
all, but, that is to say that you do not include implicit 0 input, no.

That's not quite negative input, either figuratively or literally. Are
you trying to figure out how to include actual negative feedback (i.e.
a signal that a user actively does not like an item)? That you do
include if you like, and the implementation is extended from the
original paper to meaningfully handle negative values.

On Thu, Jun 5, 2014 at 4:46 PM, redocpot <[hidden email]> wrote:

> Hi,
>
> According to the paper on which MLlib's ALS is based, the model should take
> all user-item preferences
> as an input, including those which are not related to any input observation
> (zero preference).
>
> My question is:
>
> With all positive observations in hand (similar to explicit feedback data
> set), should I generate all negative observations in order to make implicit
> ALS work with the complete data set (pos union neg) ?
>
> Actually, we test on some data set like:
>
> | user | item | nbPurchase |
>
> nbPurchase is non zero, so we have no negative observations. What we did is
> generating all possible user-item with zero nbPurchase to have all possible
> user-item pair, but this operation takes some time and storage.
>
> I just want to make sure whether we have to do that with MLlib's ALS ? or it
> has already done that ? In that case, I could simply pass only the positive
> observation as the explicit ALS does.
>
> Hao.
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/implicit-ALS-dataSet-tp7067.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

Hao Ren
Thank you for your quick reply.

As far as I know, the update does not require negative observations, because the update rule

Xu = (YtCuY + λI)^-1 Yt Cu P(u)

can be simplified by taking advantage of its algebraic structure, so negative observations are not needed. This is what I think at the first time I read the paper.

What makes me confused is, after that, the paper (in Discussion section) says

"Unlike explicit datasets, here the model should take all user-item preferences as an input, including those which are not related to any input observation (thus hinting to a zero preference). This is crucial, as the given observations are inherently biased towards a positive preference, and thus do not reflect well the user profile.
However, taking all user-item values as an input to the model raises serious scalability issues – the number of all those pairs tends to significantly exceed the input size since a typical user would provide feedback only on a small fraction of the available items. We address this by exploiting the algebraic structure of the model, leading to an algorithm that scales linearly with the input size while addressing the full scope of user-item pairs without resorting to any sub-sampling."

If my understanding is right, it seems that we need negative obs as input, but we dont use them during the updating. It is strange for me, because that will generate too many use-time pair, which is not possible.

Thx for the confirmation. I will read the ALS implementation for more details.

Hao
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

sowen
On Thu, Jun 5, 2014 at 10:38 PM, redocpot <[hidden email]> wrote:
> can be simplified by taking advantage of its algebraic structure, so
> negative observations are not needed. This is what I think at the first time
> I read the paper.

Correct, a big part of the reason that is efficient is because of
sparsity of the input.

> What makes me confused is, after that, the paper (in Discussion section)
> says
>
> "Unlike explicit datasets, here *the model should take all user-item
> preferences as an input, including those which are not related to any input

It is not saying that these non-observations (I would not call them
negative) should explicitly appear in the input. But their implicit
existence can and should be used in the math.

In particular, the loss function that is being minimized is minimizing
error in the implicit "0" cells of the input too, just with much less
weight.
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

Hao Ren
Hi,

Recently, I have launched a implicit ALS test on a real-world data set.

Initially, we have 2 data set, one is the purchase record during 3 years past (training set), and the other is the one during 6 months just after the 3 years (test set)

It's a database with 1060080 user and 23880 items.

According the paper based on which MLlib als is implemented, we use expected percentile rank(EPR) to evaluation the recommendation performance. It shows a EPR about 8% - 9% which is considered as a good result in the paper.

We did some sanity check. For example, each user has his own item list which is sorted by preference, then we just pick the top 10 items for each user. As a result, we found that there were only 169 different items among the (1060080 x 10) items picked, most of them are repeated. That means, given 2 users, the items recommended might be the same. Nothing is personalized.

It seems that the system is focusing on the best-seller, sth like that. What we want is to recommended as many different items as possible. That makes the reco sys more reasonable.

I am not sure if it is a common case for ALS ?

Thanks,

Hao

Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

sowen
On Thu, Jun 19, 2014 at 3:03 PM, redocpot <[hidden email]> wrote:
> We did some sanity check. For example, each user has his own item list which
> is sorted by preference, then we just pick the top 10 items for each user.
> As a result, we found that there were only 169 different items among the
> (1060080 x 10) items picked, most of them are repeated. That means, given 2
> users, the items recommended might be the same. Nothing is personalized.

This sounds like severe underfitting -- lambda is too high or the
number of features is too small.
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

Hao Ren
One thing needs to be mentioned is that, in fact, the schema is (userId, itemId, nbPurchase), where nbPurchase is equivalent to ratings. I found that there are many one-timers, which means the pairs whose nbPurchase = 1. The number of these pairs is about 85% of all positive observations.

As the paper said, the low ratings will get a low confidence weight, so if I understand correctly, these dominant one-timers will be more unlikely to be recommended comparing to other items whose nbPurchase is bigger.

In fact, lambda is also considered as a potential problem, as in our case, the lambda is set to 300, which is confirmed by the test set. Here is test result :

lambda = 65
EPR_in  = 0.06518592593142056
EPR_out = 0.14789338884259276

lambda = 100
EPR_in  = 0.06619274171311466
EPR_out = 0.13494609978226865

lambda = 300
EPR_in  = 0.08814703345418627
EPR_out = 0.09522125434156471


where EPR_in is given by training set and EPR_out is given by test set. It seems 300 is the right lambda, since less overfitting.

Some other parameters are showed in the following code :

val model = new ALS()
      .setImplicitPrefs(implicitPrefs = true)
      .setAlpha(1)
      .setLambda(300)
      .setRank(50)
      .setIterations(40)
      .setBlocks(8)
      .setSeed(42)
      .run(ratings_train)


we set Alpha to 1, since the max nbPurchase is 1396. Not sure if Alpha is already too big.

 
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

sowen
On Thu, Jun 19, 2014 at 3:44 PM, redocpot <[hidden email]> wrote:
> As the paper said, the low ratings will get a low confidence weight, so if I
> understand correctly, these dominant one-timers will be more *unlikely* to
> be recommended comparing to other items whose nbPurchase is bigger.

Correct, yes.


> In fact, lambda is also considered as a potential problem, as in our case,
> the lambda is set to 300, which is confirmed by the test set. Here is test
> result :

Although people use lambda to mean different things in different
places, in every interpretation I've seen, 300 is extremely high :)  1
is very high even.

(alpha = 1 is the lowest value I'd try; it also depends on the data
but sometimes higher values work well. For the data set in the
original paper, they used alpha = 40)


> where EPR_in is given by training set and EPR_out is given by test set. It
> seems 300 is the right lambda, since less overfitting.

I take your point about your results though, hm. Can you at least try
much lower lambda? I'd have to think and speculate about why you might
be observing this effect but a few more data points could help. It may
be that you've forced the model into basically recommending globally
top items, and that does OK as a local minimum, but personalized
recommendations are better still, with a very different lambda.

Also you might consider holding out the most-favored data as the test
data. It biases the test a bit, but at least you are asking whether
the model ranks highly things that are known to rank highly, rather
than any old thing the user interacted with.
Reply | Threaded
Open this post in threaded view
|

Re: implicit ALS dataSet

Hao Ren
Hi,

The real-world dataset is a bit more large, so I tested on the MovieLens data set, and find the same results:
                               
                                                                                                                               
alphalambdaranktop1top5EPR_inEPR_out
400.001502975590.058550.17299
400.01502955590.058540.17298
400.1502965600.058460.17287
401503095640.058190.17227
4025502875370.056990.14855
4050502674960.057950.13389
40100502474440.065040.11920
40200501453060.095580.11388
4030050771780.113400.12264

To be clear, there are 1650 items in this movielens data set. Top 1 and Top 5 in the table means the nb of diff items on top1 and top5 according to the preference list for each user after ALS do the work. Top1, top5, EPR_in are based on training set. Only EPR_out is on test set. In the top1 and top5, all items are taken into account, no matter whether it is purchased or not.

The table shows that small lambda( < 1) always leads to over fitting, while big lambda like 300 removes over fitting but the nb of diff items on the top 1 and top 5 of the preference list is very small (not personalized).