This post has NOT been accepted by the mailing list yet.
I am trying to figure out how to find optimal values for the the ALS userBlocks and itemBlocks parameters.
For example, I am having out of memory issues fitting the ALS model to a matrix with about 100 million users and 300 items and it sounds like these blocks parameters should help but I am unable to find documentation about how these blocks values should be adjusted.
For example, in my case should we be using more blocks? How many more? Is there a recommended ratio of blocks to users/items?
Anyone have any recommendations on best practices with these two settings?
I have the same question. Trying to figure out how to get ALS to complete
with larger dataset. It seems to get stuck on "Count" from what I can tell.
I'm running 8 r4.4xlarge instances on Amazon EMR. The dataset is 80 GB (just
to give some idea of size). I assumed Spark could handle this, but maybe I
need to try some different settings like userBlock or itemBlock. Any help