LDA and evaluating topic number

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

LDA and evaluating topic number

cbuntain
Hi, all!

Is there an example somewhere on using LDA’s logPerplexity()/logLikelihood() functions to evaluate topic counts? The existing MLLib LDA examples show calling them, but I can’t find any documentation about how to interpret the outputs. Graphing the outputs for logs of perplexity and likelihood aren’t consistent with what I expected (perplexity increases and likelihood decreases as topics increase, which seem odd to me). 

An example of what I’m doing is here: http://www.cs.umd.edu/~cbuntain/FindTopicK-pyspark-regex.html

Thanks very much in advance! If I can figure this out, I can post example code online, so others can see how this process is done.

-Best regards,
Cody
_________________
Cody Buntain, PhD
Postdoc, @UMD_CS
Intelligence Community Postdoctoral Fellow


signature.asc (363 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LDA and evaluating topic number

Stephen Boesch
I have been testing on the 20 NewsGroups dataset - which the Spark docs themselves reference.  I can confirm that perplexity increases and likelihood decreases as topics increase - and am similarly confused by these results.

2017-09-28 10:50 GMT-07:00 Cody Buntain <[hidden email]>:
Hi, all!

Is there an example somewhere on using LDA’s logPerplexity()/logLikelihood() functions to evaluate topic counts? The existing MLLib LDA examples show calling them, but I can’t find any documentation about how to interpret the outputs. Graphing the outputs for logs of perplexity and likelihood aren’t consistent with what I expected (perplexity increases and likelihood decreases as topics increase, which seem odd to me). 

An example of what I’m doing is here: http://www.cs.umd.edu/~cbuntain/FindTopicK-pyspark-regex.html

Thanks very much in advance! If I can figure this out, I can post example code online, so others can see how this process is done.

-Best regards,
Cody
_________________
Cody Buntain, PhD
Postdoc, @UMD_CS
Intelligence Community Postdoctoral Fellow