word2vec outputs Infinity,-Infinity vectors with increasing iterations

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

word2vec outputs Infinity,-Infinity vectors with increasing iterations

This post has NOT been accepted by the mailing list yet.

I'm running word2vec on a dataset with about 1 million documents and ~ 20,000 words.  I find that models that run for several iterations (15-20) frequently produce vectors of the form [Infinity,-Infinity,Infinity,-Infinity,...].  Before I reach this breaking point my observations are that increasing iterations increases the quality of the representation.  This seems like a bug.  Can someone provide insight into the nature of this problem?

    val vecSize = 200
    val minFreq = 100
    val winSize = 20000
    val senLength = 20000
    val numPartitions = 2000
    val maxIter = params.maxIter.toInt

    val documentData = spark.read.load(params.documentData)
    val word2Vec = new Word2Vec().setInputCol("domains").setOutputCol("word2vecResult").setVectorSize(vecSize).setMinCount(minFreq).setWindowSize(winSize).setMaxSentenceLength(senLength).setNumPartitions(numPartitions).setMaxIter(maxIter)
    val w2vModel = word2VecNorm.fit(documentData)