Dynamic Topic Models

1 comment
I just had a look at David Hall, Daniel Jurafsky, and Christopher Manning. "Studying the History of Ideas Using Topic Models." Proceedings from the EMNLP 2008: Conference on Empirical Methods in Natural Language Processing. October 2008. [link] This is a very interesting article, using Latent Dirichlet Allocation [link wikipedia] and some extensions, examining changing publication trends in computational linguistics. As noted on the Wikipedia entry, this approach [LDA] is described in David Blei, Andrew Y. Ng, and Michael I. Jordan. "Latent Dirichlet Allocation." Journal of Machine Learning Research 3 (January 2003) [link]. David Blei has released code [link] and has a number of samples, a listserv, etc. on his site. He also gave a great presentation of his work as a Google talk "Modeling Science: Dynamic Topic Models of Scholarly Research" in May 2007 [link video and paper]. This appears to be a powerful technique, which has the ability to handle changing vocabularies over a century of scientific writing.

In trying to run it on OS-X, I am able to currently get topics for the sample AP collection provided by Blei, but not able to get inferences as it throws malloc errors. I'm looking at the mailing list to see if there are any hints about OS-X.

Blei lists several implementations on his site, including one part of Mallet, which I think we installed here at one point. See also http://gibbslda.sourceforge.net/
for another implementation and some samples run on large Wikipedia and Medline (abstract) collections.

Also noticed a Ruby module described at
http://mendicantbug.com/2008/11/17/lda-in-ruby/
Next PostNewer Post Previous PostOlder Post Home

1 comment:

  1. Who/What is providing the Dirichlet prior to the topic distribution?

    ReplyDelete