word2vec outputs Infinity,-Infinity vectors with increasing iterations
This post has NOT been accepted by the mailing list yet.
I'm running word2vec on a dataset with about 1 million documents and ~ 20,000 words. I find that models that run for several iterations (15-20) frequently produce vectors of the form [Infinity,-Infinity,Infinity,-Infinity,...]. Before I reach this breaking point my observations are that increasing iterations increases the quality of the representation. This seems like a bug. Can someone provide insight into the nature of this problem?
val vecSize = 200
val minFreq = 100
val winSize = 20000
val senLength = 20000
val numPartitions = 2000
val maxIter = params.maxIter.toInt
val documentData = spark.read.load(params.documentData)
val word2Vec = new Word2Vec().setInputCol("domains").setOutputCol("word2vecResult").setVectorSize(vecSize).setMinCount(minFreq).setWindowSize(winSize).setMaxSentenceLength(senLength).setNumPartitions(numPartitions).setMaxIter(maxIter)
val w2vModel = word2VecNorm.fit(documentData)