During the summer (2009), a user (UofC PhD, tenured elsewhere) wrote to ask if there was any way to search the
Encyclopédie and "generate a list of all articles that cross-reference a given article". We went back and forth a bit, and I slapped a little toy together and let him play with it, to which his reply was "Oh, this is cool! Five minutes of playing with the search engine and I can tell you it shows fun stuff...". This is, of course, an excellent suggestion which we have talked about in the past, usually in the context of visualizing relationships of articles in various ways. At the highest level, visualizing the relationships of the
renvois is what Gilles and I attempted to do in our general "
cartography paper"[1] and, more recently, Robert and Glenn (et. al.) tried, in a radically different way, to do in their work on "
centroids"[2].
The current implementation of the
Encyclopédie under PhiloLogic will allow users to follow
renvois links (within operational limits to be outlined below), but does not support searching and navigating the
renvois in any kind of systematic fashion. Since this is something I think warrants further consideration, I thought it might be helpful to document this toy, give some examples, let folks play with it, outline some of the current issues, and conclude with some ideas about what might be done going forward.
To construct this toy, I wrote a recognizer to extract metadata for each article in the
Encyclopédie which has one or more
renvois. As part of the original development of the
Encyclopédie, each cross reference was automatically detected from certain typographic and lexical clues. This resulted in roughly 61,000 cross-references. Accordingly, the extracted database has 61,000 records. I loaded these into a simple MySQL database and used a standard script to support searching and reporting. The search parameters may include articles headwords, authors, normalized and English classes of knowledge as well as the term(s) being cross referenced. For example, there are 39 cross-referenced article pairs for the headword
estomac. As you can see from the output, I'm listing the headword, author, classes of knowledge, and the cross referenced term. You can get the article of the cross referenced term or the cross-references in that article. Thus, the second example shows the link to Digestion:
ESTOMAC, ventriculus (Tarin: Anatomie, Anatomy ) ==> Digestion || renvois
[The renvois of Digestion find 56 articles pairs, including one to intestins]
DIGESTION (Venel: Economie animale, Animal economy ) ==> Intestins || renvois
Intestins (unknown: Anatomie, Anatomy ) ==> Chyle || renvoisand so on ==>lymphe==>sang==>
ad nauseum. No, there is no
ad nauseum, just how you might feel after going round and round.
Now, there are problems, but please go ahead and play with this now using the
submit form, as long as you promise to come back and read thru the rest of this and let me know about any other problems.
ProblemsAs noted above, the renvois were identified automatically. And as with most of these things, it worked reasonably well. But you will see link errors and other things which indicate problems. Glenn reported these to me and I was going to eliminate them. On second thought, this little toy lets to consider the
renvois rather more systematically. Where you see a link error is (probably) a recognizer error, which either failed to get a string to link or got confused by some typography. The linking mechanism itself is based on string searches. In other words, whenever you click on a
renvois, you are in fact performing a search on the headwords. This simple heuristic works reasonably well, returning string matched headwords. In some cases, you get nothing because there is no headword that has the
renvois word(s), and at other times you will get quite a list of articles, which may or may not include what the authors/editors intended. It is, of course, well known that many renvois simply don't correspond to an article and many others differ in various ways from the article headwords. I am also applying a few rules to renvois searching to try to improve recall and reduce noise. So, this also adds another level of indirection.
Now, ideally, one would go through the entire database, examine each
renvois and build a direct link to the
one article that the authors/editors intended. But we're talking 60,000+
renvois against 72,000 (or so) articles and it is not clear that humans could resolve this in many instances. When Gilles and I worked on this, we used a series of (long forgotten) heuristics to filter out noise and errors. So, this simple toy works within operational limits and gives us a way to more systematically identify possible errors and ways to improve it.
Future WorkAside from being a quick and dirty to way get some notion of errors in the
renvois, we might be able to make this more presentable. Please feel free to play with this and suggest ways to think about. In the long haul, I would
love a totally cool visualization. A clickable directed graph, so you could click on a node and re-center it on another article, or class of knowledge or author. Maybe something like
Tricot's representation of the classes of knowledge. Or maybe something like
DocuBurst. Marti Heast's chapter on
visualizing text analysis, is a treasure-trove of great ideas.
For the immediate term, I would like to recast this simple model to allow the user to specify number of steps. So, set the number of iterations to follow, so you would get something like:
ESTOMAC, ventriculus (Tarin: Anatomie, Anatomy ) ==> Digestion || renvois
DIGESTION (Venel: Economie animale, Animal economy ) ==> Intestins || renvois
Intestins (unknown: Anatomie, Anatomy ) ==> Viscere || renvois
ESTOMAC, ventriculus (Tarin: Anatomie, Anatomy ) ==> Chyle || renvois
CHYLE (Tarin: Anatomie | Physiologie, Anatomy. Physiology ) ==> Sanguification || renvois
SANGUIFICATION (unknown: Physiologie, Physiology ) ==> Respiration || renvois
RESPIRATION (unknown: Anatomie | Physiologie, Anatomy | Physiology ) ==> Air || renvois
Following this chains of
renvois either until you run out or your hit an iteration limit. I will try to follow this up with both the multi-iteration model and see if I can recover some of what Liz tried to do using
GraphViz to generate clickable directed graphs.
References[1] Gilles
Blanchard et Mark
Olsen, « Le système de renvoi dans l’
Encyclopédie: Une cartographie des structures de connaissances au
XVIIIe siècle »,
Recherches sur Diderot et sur l'Encyclopédie, numéro 31-32
L'Encyclopédie en ses nouveaux atours électroniques: vices et vertus du virtuel, (2002) [En ligne], mis en ligne le 16 mars 2008.
[2] Charles Cooney, Russell Horton, Robert Morrissey, Mark Olsen, Glenn Roe, and Robert Voyer, "Re-engineering the tree of knowledge: Vector space analysis and centroid-based clustering in the
Encyclopédie", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008