Encyclopédie under KinoSearch

3 comments
One of the things that I have wanted to do for a while is to examine implementations of Lucene, both as a search tool to complement PhiloLogic and possibly as a model for future PhiloLogic renovations. Late this summer, Clovis identified a particular nice open source, perl implementation of Lucene called KinoSearch. This looks like it will fit both bills very nicely indeed. As a little experiment, I loaded 73,000 articles (and other objects) from the Encyclopédie, and cooked up a super simple query script. This allows...
Read More

back to comparing similar documents

Leave a Comment
I mentioned a little while ago some work I did on comparing one document with the rest of the corpus it belongs to ( the examples I used in that blog post will not give the same results anymore, the results might not be as good, I haven't optimized the new code for the Encyclopédie yet). The idea behind it was to use the topic proportions for each article generated from LDA, and come up with a set of calculations to decide which document(s) was closest to the original document. The reason why I'm mentioning here once more...
Read More

Supervised LDA: Preliminary Results on Homer

Leave a Comment
While Clovis has been running LDA tests on Encyclopédie texts using the Mallet code, I have been running some tests using the sLDA algorithm. After a few minor glitches, Richard and I managed to get the sLDA code, written by Chong Wang and David Blei, from Blei's website up and running. Unlike LDA, sLDA (Supervised Latent Dirichlet Allocation), requires a training set of documents paired with corresponding class labels or responses. As Blei suggests, these can be categories, responses, ratings, counts or many other things....
Read More

Encyclopédie Renvois Search/Linker

Leave a Comment
During the summer (2009), a user (UofC PhD, tenured elsewhere) wrote to ask if there was any way to search the Encyclopédie and "generate a list of all articles that cross-reference a given article". We went back and forth a bit, and I slapped a little toy together and let him play with it, to which his reply was "Oh, this is cool! Five minutes of playing with the search engine and I can tell you it shows fun stuff...". This is, of course, an excellent suggestion which we have talked about in the past, usually in the context...
Read More

Archives Parlementaires: lèse (more)

Leave a Comment
As I mentioned in my last in this thread, I was a bit surprised to see just how prevalent the construction lèse nation had become early in the Revolution. The following is a sorted KWIC of lEse in the AP, with the object type restricted to "cahiers", resulting in 38 occurrences. These are, of course, the complaints sent to the King, reflecting relatively early developments of Revolutionary discourse. Keeping in mind all of the caveats regarding this data, we can see some interesting and possibly contradictory uses:CAHIER:...
Read More

Topic Based Text Segmentation Goodies

Leave a Comment
As you may recall, Clovis ran some experiments this summer (2009) applying a perl implementation of Marti Heart's TextTiling algorithm to perform topic based text segmentation on different French documents (see his blog post and related files). Clovis reasonably suggests that some types of literary documents, such as epistolary novels, may be more suitable candidates than other types, because they do not have the same degree of structural cohesion. Now, as I mentioned in my first discussion of the Archives Parlementaires,...
Read More

Archives Parlementaires: lèse collocations

Leave a Comment
The collocation table function of PhiloLogic is a quick way to look at changes in word use. Lèse majesté, treason or injuries against the dignity of the sovereign or state, is a common expression. The collocation table below shows terms around "lese | leze | lèse | lèze | lése | léze" in ARTFL Frantext (550 documents, 1700-1787) with majesté being by far the most common.It is interesting to note that the construction "lèse nation"...
Read More

Archives Parlementaires (I)

Leave a Comment
A couple of weeks ago, some ARTFL folks discussed the notion of outlining some research and/or development projects that we will be, or would like to be, working on the coming months. We discussed a wide range of possibilities that could involve substantive work, using some of the systems we have already developed or are working on, or more purely technical work. Everyone came up with some pretty interesting projects and proposals, and we decided that it might be entertaining and useful for each of us to outline a specific...
Read More
Next PostNewer Posts Previous PostOlder Posts Home