Topic inference using the Encyclopédie trained model

3 comments
While trying to use the Encyclopédie trained topic model on the Mémoires de Trévoux, something quite unexpected happened, the topic modeler was finding it hard to find topics that matched the Trévoux articles. You can see those results here:
http://robespierre.uchicago.edu/topic_modeling/inference/encyclo2trevoux.txt
Since the topic inference feature in mallet is relatively new, I though of creating a model out of the Trévoux, and then compare the topic proportion generated from the topic trainer with the one generated using the model. So basically, I tested the model against the corpus of articles from which it originated. In all likelihood, the results were going to be excellent. Well, they weren't, therefore showing that the topic inferencer is not yet operational (it is a new feature after all). On the other hand, I did notice something, that if you compare the results, you'll notice that the same topics (mostly) are prominent in both, only the proportion measure is off, approximately divided by ten when using topic inference. Here are those results:
when using topic training:
http://robespierre.uchicago.edu/topic_modeling/inference/proportions.txt
when using topic inference:
http://robespierre.uchicago.edu/topic_modeling/inference/proportions_itself.txt
The question is, can I trust those results. My initial analysis tends to show that it does work, but it's definitely not as accurate as the first experiments I did with topic modeling. Some more digging is needed, eventually getting in touch with the Mallet developers.
Next PostNewer Post Previous PostOlder Post Home

3 comments: