Cyclopaedia to Encyclopédie

1 comment

From Cyclopaedia to Encyclopédie: Experiments in Machine Translation and Sequence Alignment

It is well known that the Encyclopédie ou dictionnaire raisonné des sciences, des arts et des métiers began first as a modest translation project of Ephraim Chambers' Cyclopaedia in 1745 [1]. Over the next few years, Diderot and d'Alembert would replace the original editors and the project would be duly transformed from a simple translation into an effort to compile and organise the sum total of the world's knowledge. Over the course of their editorial work, Diderot, and most notably d'Alembert, were not shy in incorporating these translations of the Cyclopaedia as filler for the Encyclopédie, many of which were inherited from the earlier project. Indeed, "ils ont laissé une bonne partie de ces articles presque inchangés, ou avec des modifications insignifiantes" [2]. The philosophes were nonetheless conscious of their debt to their English predecessor Chambers. His name appears some 1,154 times in the text of the Encyclopédie and he is referenced as sole or contributing source to 1,081 articles, where his name appears in italics at the end of a section or article. Given the scale of the two works under consideration, systematic evaluation of the extent of the philosophes' use of Chambers has remained, even today, a daunting task. John Lough, in 1980, framed the problem nicely:

So far no one has had the patience to make a detailed study of the exact relationship between the text of Diderot's Encyclopédie and the work of Ephraim Chambers. This would no doubt require several years of arduous toil devoted to comparing the two works article by article. [3]

Recent developments in machine translation and sequence alignment now offer new possibilities for the systematic comparison of digital texts across languages. The following post outlines some recent experimental work in leveraging these new techniques in an effort to reduce the "arduous toil" of textual  comparison, giving some preliminary examples of the kinds of results that can be achieved, and providing some cursory observations on the advantages and limitations of such systems for automatic text analysis. 

Our two comparison datasets are the ARTFL Encyclopédie (v. 1117) and the recently digitised ARTFL edition of the 1741 Chambers' Cyclopaedia (link). The 1741 edition was selected as it was one of the likely sources for the translation original project and we were able to work from high quality pages images provided by the University of Chicago Library [4]. In a nutshell, our approach was to generate a machine translation of all of the Cyclopaedia articles into French and then use ARTFL's Text-PAIR sequence alignement system to identify similar passages between this virtual French Cyclopaedia and the Encyclopédie, with the translation providing links back to the original English edition of the Chambers as well as links to the relevant passages in the Encyclopédie.  

For the English to French machine translation of Chambers, we examined two of the most widely-used resources in this domain, Google Translate and DeepL. Both systems provide useful APIs as part of their respective subscription services, and both provide translations based on cutting-edge neural network language models. We compared results from various samples and found, in general, that both systems worked reasonably well, given the complications of eighteenth-century vocabularies (in both English and French) and many uncommon and archaic terms (this may be the subject of a future post). While DeepL provided somewhat more satisfying translations from a reader's perspective, we ultimately opted to use Google Translate for the ease of its API and its ability to parse the TEI encoding of our documents with little difficulty. The latter is of critical importance, since we wanted to keep the overall document structure of our dictionaries to allow for easy navigation between the versions. 

Operationally, we segmented the text of the Cyclopaedia into short blocks, split at paragraph breaks, and sent them for automatic translation via the Google API, with a short delay between blocks. This worked relatively well, though the system would occasionally throw timeout or other errors, which required a query resend. You can inspect the translation results here - though this virtual French edition of the Chambers is not really meant for public consumption. Each article has a link at the bottom to the corresponding english version for the sake of comparison. It is important to note that the objective here is NOT to produce a good translation of the text or even on that might serve as the basis for a human edition. Rather, this machine-generated edition exists as a "pivot-text" between the English Chambers and French Encyclopédie, allowing for an automatic comparison of the two (or three) versions using a highly fault-tolerant sequence aligner designed to pick out commonalities in very noisy document spaces [5].

The next step was to establish workable parameters for the Text-PAIR alignment system. The challenge here was to find commonalities between the French translations created by eighteenth-century authors and translators and machine translations produced by a modern automatic translation system. Additionally, the editors and authors of the Encyclopédie were not necessary constrained to produce an exact translation of the text in question, but could and did, make significant modifications to the original in terms of length, style, and content. To address this challenge we ran a series of tests with different matching parameters such as n-gram construction (e.g., number of words that constitue an n-gram), minimum match lengths, maximum gaps between matches, and decreasing match requirements as a match length increased (what we call a "flex gap") among others on a representative selection of 100 articles from the Encyclopédie where Chambers was identified as the possible source. It is important to note that even with the best parameters [6], which we adjusted to get favorable recall and precision results, we were only able to identify 81 of the 100 articles. Some articles, even where clearly affiliated, were missed by the aligner, due to the size of the articles (some are very small) and fundamental differences in the translation of the English. For example, the article Compulseur is attributed by Mallet to Chambers, but the machine translation of Compulsor  is a rather more literal and direct translation of the English article than what is offered by Mallet. Further relaxing matching parameters could potentially find this example, but would increase the number of false positives, in effect drowning out the signal with increased noise.

All things considered, we were quite happy with the aligner's performance given the complexity of the comparison task and the multiple potential variations between historical text and modern machine translations. To give an example of how fine-grained and at the same time highly-flexible our matching parameters needed to be, see the below article 'Gynaecocracy', which is a fairly direct translation on a rather specialised subject, but that nonetheless matched on only 8 content words. 

Other straightforward articles were however missed due to differences in the translation and sparse matching n-grams, see for example the small article on "Occult" lines in geometry below, where the 6 matching words weren't enough to constitute a match for the aligner.

Obviously, this is a rather inexact science, reliant on an outside process of automatic translation and the ability to match a virtual text that in reality never existed. Nonetheless, the 81% recall rate we attained on our sample corpus seemed more than sufficient for this experiment and allowed us to move forward towards a more general evaluation of the entirety of identified matches. 

Once settled on the optimal parameters, we thenText-PAIR to generate both an alignment database, for interactive examination, and a set of static files. Both of these results format are used for this project. The alignment database (link) contains some 7,304 aligned passage pairs. The system allows queries on metadata, such as author and article title as well as words or phrases found in the aligned passages. The system also uses faceted browsing to allow the user to summarize results by the various metadata [7].  Each aligned passage is presented as a facing page representation and the user can toggle a display of all of the variations between the two aligned passages. As seen below, the variations between the texts can be extensive.

Text-PAIR also contextualises results back to the original document(s). For example, the following is the article "Almanach" by d'Alembert, showing the aligned passage from Chambers in blue.  

In this instance, d'Alembert reused almost all of Chambers' original article Almanac, with some minor variations, but does not to appear to have indicated the source of the first part of his article (page image).  

The alignment database is a useful first pass to examine the results of the alignment process, but it is limited in at least two ways. It identifies each aligned passage, but does not merge multiple passages identified in in article pairs. Thus we find 5 shared passages between the articles "Constellation". The interface also does not attempt to evaluate the alignments or identify passages that occur between different articles. For example, D'Alembert's article ATMOSPHERE indeed has a passage from Chambers' article "Atmosphere", but also many longer passages from the article Generation.  

To accumulate results and to refine evaluation, we subsequently processed the raw Text-PAIR alignment data as found in the static output files. We developed an evaluation algorithm for each alignment, with parameters based on the length of the matching passages and the degree to which the headwords were close matches. This simple evaluation model eliminated a significant number of false positives, which we found were typically short text matches between articles with different headwords. The output of this algorithm resulted in two tables, one for matches that were likely to be valid and one that was less likely to be valid, based on our simple heuristics (see a selection of the 'YES' table below). We are, of course, making this distinction based on the comparison of the machine translated Chambers headwords and the headwords found in the Encyclopédie, so we expected that some valid matches would be identified as invalid. 

The next was phase of the project included the necessary step of human evaluation of the identified matches. While we were able to reduce the work involved significantly by generating a list of reasonably solid matches to be inspected, there is still no way to eliminate fully the "arduous toil" of comparison referenced by Lough. More than 5,000 potential matches were scrutinised, looking in essence for 'false negatives', i.e., matches that our evaluation algorithm classed as negative (based primarily on differences in headword translations) but that were in reality valid. The results of this work was then merged into in a single table of what we consider to be valid matches, a list that includes some 3,700 Encyclopédie articles with at least one matching passage from the Cyclopaedia. These results will form the basis of a longer article that is currently in preparation.


In all, we found some 3,778 articles in the Encyclopédie that upon evaluation seem highly similar in both content and structure to articles in the 1741 edition of Chambers' Cyclopaedia. Whether or not these articles constitute real acts of historical translation is the subject for another, or several other, articles. There are simply too many outside factors at play, even in this rather straightforward comparison, to make blanket conclusions about the editorial practices of the encyclopédistes based on this limited experiment [7]. What we can say, however, is that of the 1,081 articles that include a "Chambers" reference in the Encyclopédie, we only found 689 with at least one matching passage. Obviously, this recall rate 63.7% is well below the 81% we attained on our sample corpus, probably due to overfitting the matching algorithm to the sample which warrants further investigation. But, beyond testing this ground truth, we are also left with the rather astounding fact of 3,089 articles with no reference to Chambers whatsoever, all of which seem, at first blush, to be at least somewhat related to their English predecessors.

The overall evaluation of these results remains ongoing, and the "arduous toil" of traditional textual comparison continues apace, albeit guided somewhat by the machine's heavy hand. Indeed, the use of machine translation as a bridge between documents to find similar passages, be they reuses, plagiarisms, etc. is, as we have attempted to show here, a workable approach for future research, although not without certain limitations. The Chambers --> Encyclopédie task outlined above is fairly well constrained and historically bounded. More general applications of these same methods may well yield less useful results. These reservations notwithstanding, the fact that we were able to unearth many thousands of valid potential intertextual relationships between documents in different languages is a feat that even a few years ago might not have been possible. As large-scale language models become ever more sophisticated and historically aware, the dream of intertextual bridges[8] between multilingual corpora may yet become a reality.

- Glenn Roe & Mark Olsen


1. The page image of the title page from the 1745 prospectus is taken from ARTFL's "18th" volume of the Encyclopédie

2. Paolo Quintili, "D'Alembert « traduit » Chambers. Les articles de mécanique de la Cyclopædia à l'Encyclopédie", Recherches sur Diderot et sur l'Encyclopédie 21 (1996):75. [link]

3. John Lough, "The Encyclopédie and the Chambers' Cyclopaedia", in SVEC 185, Oxford: Voltaire Foundation (1980): 221. 

4. On the possible editions of the Cyclopaedia used by the encyclopédistes, see Irène Passeron, "Quelle(s) édition(s) de la Cyclopœdia les encyclopédistes ont-ils utilisée(s) ?", Recherches sur Diderot et sur l'Encyclopédie 40-41 (2006): 287-92. [link]

5. See Clovis Gladstone, Russ Horton, and Mark Olsen, "TextPAIR (Pairwise Alignment for Intertextual Relations)", ARTFL Project, University of Chicago, 2008-2021.

6. See comparison table. The primary parameters chosen were bigrams, stemmer=true, word len=3, maxgap=12, flexmatch=true, minmatchingngrams=5.  Consult the TextPair documentation and configuration file for a description of these values.  

7. The question of the Dictionnaire de Trévoux is one such factor, as it is known that both Chambers and the encyclopédistes used it as a source for their own articles--so matches we find between the Chambers and Encyclopédie may indeed represent shared borrowings from the Trévoux and not a translation at all. Or, more interestingly, perhaps Chambers translated a Trévoux article from French to English, which a dutiful encyclopédiste then translated back to French for the Encyclopédie--in this case, which article is the 'source' and which the 'translation'? For more on these particular aspects of dictionary-making, see our previous article "Plundering Philosophers: Identifying Sources of the Encyclopédie", Journal of the Association for History and Computing13.1 (Spring 2010) [link] and Marie Leca-Tsiomis' response, "The Use and Abuse of the Digital Humanities in the History of Ideas: How to Study the Encyclopédie", History of European Ideas 39.4 (2013): 467-76. 

8. For more on 'intertextual bridges' in French, see our current NEH project [link].

Read More

Federated Search and PhiloLogic -- from works to (someday) words

Leave a Comment
Over the past several years, the ARTFL Project has been developing the code infrastructure for the Intertextual Hub reading environment that federates heterogeneous text collections, extracting data from individual PhiloLogic4 instances and exposing that data to text analysis algorithms in order to allow users to navigate between individual and larger groups of texts related through shared themes, ideas, and passages.

We have now adapted components of this infrastructure to enable federated bibliographic searching on all of the text collections running under PhiloLogic. With the PhiloLogic Federated Bibliography Search database, we offer a simple, yet flexible search system that allows users to search for texts across approximately 90 individual collections in nearly a dozen languages. We currently allow search by author, title, and collection language. Searches can be further delimited by access type and by date range. So for example, a search for titles containing the word “slavery” written in English between 1750 and 1800 yields 38 results from the American Archives Collection, ECCO-TCP, and the Evans Early American Imprint Collection:

Search results contain links to work titles and collections. In results, we note the access status of the collection, whether open or limited to subscribing institutions or to users at the University of Chicago. This same search can be expanded across French and English collections by using a Boolean “OR” and entering “slavery OR esclavage” in the title field:

This search yields several titles in the open-access Newberry French Revolution Collection, one in the Frantext collection, and one -- a play entitled “L’Esclavage des Noirs, ou L’Heureux Naufrage, Drame” -- in the Théâtre Classique collection.

We envision this bibliographic search system to be the first of many such tools that permit search across the entirety of our collections. In the Intertextual Hub, users can conduct word or topic vector searches across all seven of the 18th-century French collections included in it. Results are returned ranked by relevance. For example, see these results for a search using a topic vector that contains astronomical terms:

Taking inspiration from this federated search approach, we would create a mechanism that enables combined metadata and fulltext queries across all PhiloLogic instances -- or at least a logically coherent subset thereof -- at once, in real time. Users would no longer be constrained to working inside single collections, but could conduct searches across multiple collections and potentially in multiple languages. For example, instead of searching for “slavery OR esclavage” only in titles, users could search for those terms in any number of collections running under PhiloLogic.

The technical details of such a search scheme remain to be hashed out, of course. But the great thing about PhiloLogic4 is that its fundamental architecture makes it possible to create standalone widgets or external apps that query database instances via an API and then repackage and render search results independently. For example, ARTFL’s PhiloReader apps for both Android and iOS work in exactly this way, and from the beginning were meant to be a demonstration of PhiloLogic’s server capabilities (download the Encyclopédie reader apps here and here).

Encyclopédie app search suggestionsEncyclopédie app metadata query results

These screenshots illustrate a simple example of the Encyclopédie app interacting with the PhiloLogic4 API. In the left screenshot, the app gets metadata search suggestions dynamically, in this case "Astronomie | Géographie". Query results for articles with that classification appear in the right screenshot.

For a federated search system, a client would send queries to however many PhiloLogic instances; gather and sort query results or links to query results; then present those results to the user. Again, we would first have to work out certain details before creating a search system like this, such as determining the exact nature of query results; whether and how to perform relevance ranking on results; whether we would need to integrate certain kinds of reporting features into PhiloLogic as a parallel development activity, etc.

However we proceed, the experience of building the Intertextual Hub has taught us that we can tap into the indexing, processing, and reporting capabilities of PhiloLogic to draw together many individual, heterogeneous text collections and create larger-scale research environments that allow users to engage in text analysis of an incredibly broad scope.
Read More

Topic Models and Word Vectors

Leave a Comment


The Intertextual Hub is built around several different algorithms to facilitate document search, similarity and navigation. In previous posts in this series, I have examined the applications of sequence alignment, topic modeling, and document similarity in various contexts. A primary objective of the Hub is to direct attention to particular documents that may be of interest.  Arriving at a specific document to consult, the user is offered two views. One is a document browse mode, which provides links to similar documents and borrowed passages if detected. The second is to consult the Topic Distribution of the document. 

The left side of image above is top element of the Topic Model report for the Dénonciation a toutes les puissances de l'Europe : d'un plan de conjuration contre sa tranquilité général (link to text), an anonymous attack on the Club de 1789 published in 1790.  As mentioned in an earlier post in this series, the first topic, number 123, is clearly about elections, which does indeed reflect a section describing elections in the club constitution.  The lesser weighted topics in the document, 114, 111, 128 and so on, are all plausible topics of this document.  The right side of this image, shows word cloud, size reflecting weight, of the most distinctive vocabulary identified in this document.  This simple list is a considerably better guide to the specific content of the document, a denunciation of a conspiracy against the souverains of Europe to which is appended extracts from the constitution of the club.

Below the lists of Topics and Word Cloud of most distinctive tokens in Topic Model  report, there are two lists of 20 documents.  Below Topics are the top 20 documents identified by the similarity of topic distributions while below the Word Cloud are the top 20 documents as measured by similar vocabulary.  

The first two entries on the right hand column are parts of Sieyès'  Ébauche d'un nouveau plan de société patriotique, adopté par le Club de mil sept cent quatre-vingt-neuf  (BNF), found in Dénonciation, followed by Condorcet's constitutional proposal of 1793.  The two lists represent two different ways to identify similar documents.  It is useful to note the overlaps between the two lists, since these are identified as being relevant by both measures:

The contrast between topics and most distinctive words can be very significant.  Mercier's brief chapter on Vaches in Tableau de Paris is striking.  There are no overlaps between the similar document links and on two words, animal and compagnie, appear in the topic words, and those for low weighted topics.  Other documents are marked by the relative alignment of topics and distinctive words.  The topic/word report for Lettres écrites à M. Cérutti par M. Clavière, sur les prochains arrangemens de finance (1790 text link) shows that the distinctive tokens are found frequently on the top topic model word lists and there is more overlap between the most similar documents.
It is hardly surprising to find that there are significant distinctions between the representation of the contents of a specific document under a Topic Model and Word Vector (Most Distinctive Vocabulary) can be significantly different.  Topic Models attempt to identify the best fit of a document in an arbitrary number of groups.  Many documents about specific things, like cows in Paris, may well fall between these groups and be assigned to topics which are only very tangentially related to the contents of the document.  This weak relationship to topics is reflected by the limited number of tokens shared between the most heavily weighted topic terms and distinctive vocabulary of a document as well as limited or no overlap between lists of similar documents.  Topic models are an effective technique at identifying large patterns of topic development for search and analysis and classifying documents within these large patterns.  By contrast, identifying documents related by similar vocabulary, generally falling under the rubric of "nearest neighbor search" (NNS) is able to identify and leverage the particularities of a specific document to identify others closely related to it, but cannot by itself be used to aid with larger classifications or themes.  

Thus we provide the user in the Intertextual Hub with these two distinct views of a document, identifying the topics in which it is situated and the its most distinctive vocabulary and other documents which most closely resemble it.  A quick examination of the topics, words, and document lists gives the reader a pretty good sense of the degree to which a specific document falls coherently into one of the 150 topics in this model.  

The suggestion that humans should consider both measures and make a determination of the goodness of fit of a document to the topic model, it may be worth experimenting with the use of NNS measures as a way to evaluate Topic Models.  As I have shown above that a topic model, say of 150 topics generated using a set of parameters, can cover some documents more compelling than others.  In the example, finance documents are specific enough to be well covered by several related topics.  This leads to the possibility of establishing a quantitative measure by using somewhat independent measures, topics and word vectors, to assess the validity of a particular topic model.  For each document in a collection, this would be assessed by 
  • the number of common tokens in the top N topics with the most distinctive words;
  • number of common documents in the two lists;
  • number of matching topics (say top 3) for each document in the two lists of documents.
For every document, one would calculate how well the topic approximates the nearest neighbors of that document, measuring 0 for not at all to 1 for perfect identity.  We have two ways of dividing up an information space, topic models from effectively the top down (we're going have 150 buckets) and the other from the bottom up (but we don't know how many buckets).  Like a Venn diagram, the more these overlap, the better the coverage for that document.

Summing up this measure across all of the documents, one would arrive at a single value for all topics, and possibly a single value for every topic.  You could then adjust parameters.  Of course, you could overfit this, simply by saying I will have the same number of topics as documents.  But it might even give you a measure of how many topics is best, by observing a decrease in the coverage, which would be theoretically possible by spreading the topics to thin across the information space.  



Read More

Comité d'agriculture et des arts

Leave a Comment

Several décades after the fall of Robespierre, the Convention nationale issued a decree reorganizing its committees on 7 fructidor II. The collections in the Hub have at least two sources for this decree, one in the Newberry French Revolution Collection which was printed by the order of the Convention, and the other reproduced in the Baudouin collection of Revolutionary Laws. One of the 16 committees, with a small but significant charge was the Comité d'agriculture et des arts:

This was, of course, nothing new. Previous revolutionary assemblies have organized agriculture committees and commissions (Mellah, 2020) and this committee simply replaced the Convention's Comité d'agriculture. Concerns with agrarian life and subsistence was a pressing issue during the period. A full text search, using the PhiloLogic4 instance, of the Revolutionary Laws collection shows several hundred decrees being introduced with variations on the expression "après avoir entendu le rapport de son comité d'agriculture".  

Starting from PhiloLogic search results, one can follow the links to the individual laws containing the phase (example link) and, in many cases, navigate from the law to the legislative session in the Archives Parlementaires by clicking on a date, such as Du 11 Octobre. == 13 du même mois.   

The range of activities of the agriculture committees were not simply limited to the production of reports for potential legislative action. Searching for "convention agriculture" in the Hub, returns some 15 titles which reflect the scope of the interests of the Convention's agriculture committee.  The committee published manuals for the cultivation, storage and use of various crops such as potatoes, cabbage, and carrots, all of which link back to the long tradition of agricultural writings.  Looking at the list of similar documents found for Instruction sur la conservation et les usages des pommes-de-terre, for example, finds a relevant number of Ancien Regime texts, including a chapter on potatoes from the 1772 translation of Arthur Young's The farmer's guide.  

Grain and bread were, of course, critically important.  In 1794, the committee published Moyens propres a rendre plus économique l'emploi des farines : provenant des grains nouvellement récoltés ; et à augmenter la qualité du pain qu'elles doivent donner (link) which is linked by similarity measures to a variety of earlier texts, including the chapter on grains in Duhamel du Monceau's, Traité de la culture des terres, suivant les principes de M. Tull, Anglois (1753).  Examination of the titles most closely related to this text by topic and vocabulary show a mixture of practical as well as more theoretical works.

Given that bread constituted at least half of the average wage earners' expenditures through the 18th century (link), it is no surprise that the conditions of the grain trade and the price of bread was of capital importance and was widely debated in the two weeks leading up to the loi du Maximum of 1793 (RevLaws AP).  The first document on the search "convention agriculture" in the Hub is to an anonymous text Mémoire sur la fixation du maximum du prix des grains dans toute la France : remis au Comité d'agriculture de la Convention nationale, l'an premier de la République [1792] found in the Newberry FRC (link).  The collections contain two other renditions of this document, which are shown as a the top two most similar documents on reported at the top of link.  The first as an annex to the April 25, 1793 session of the Convention found in the Archives Parlementaires and the second in Goldsmiths-Kress collection.  The anonymous author opens unequivocally: 

La subsistance du peuple est le premier objet qui doive occuper les législateurs. Il faut assurer l'existence des hommes avant de songer à régler l'usage de leurs facultés.  (AP version).  

Taking on the elder Mirabeau (the friend of man) and the physiocratic tradition of free trade before the Revolution, he writes:

Des philosophes, amis des hommes, avaient cru voir, dans la liberté indéfinie du commerce et même de l'exportation des grains, un principe de fécondité et d'abondance qu'ils regardaient comme le plus sûr préservatif contre la famine.   Pendant qu'ils se livraient à ces contemplations, un gouvernement populicide opérait la famine par le commerce et par l'exportation des grains; l'absurdité du système de la liberté indéfinie de l'exportation a été démontrée par le fait, et cette exportation a été prohibée par l'Assemblée constituante. [1]

The list of top 20 most similar documents and the documents most related topics show an interesting mix of opinions for and against free trade in grain, with many echoes back to the earlier debates as far back as the Turgot ministry.  Beffroy's Rapport fait au nom de la section des subsistances chargée de combattre les économistes (1792) is equally as pointed

Ouvrez maintenant Young, consultez Smith, interrogez Turgot, voyez Beaudot, relisez Ferraud, Roland, Périés & tous les partisans de leur systême; ils ne vous parlent que de l'interest du marchand & du Spéculateur. Or, l'expérience vous a prouvé que cet intérêt mercantille ne s'alimente que des malheurs publics : jugez donc entre lui & celui du peuple que vous représentez. (link)

On the other side, Creuzé-Latouche's Sur les subsistances (1793) makes the case that the free trade in grain implemented in the Turgot administration resulted in low and stable prices across the country in spite of poor harvests in those years.  Further, he writes, Turgot was defending the liberty and sovereignty of the people:

Mais pour vous faire mieux connoître quel étoit ce ministre Turgot, qui avoit voulu établir la liberté entière du commerce des grains, il faut vous dire qu'il supprima les corvées , qu'il donna , le premier, l'idée des assemblées provinciales, qui dévoient bientôt rappeler la nation a sa souveraineté ; et qu'il se fit chasser de la cour , pour avoir voulu défendre la liberté du peuple, et abolir les fiefs. (link)

As a defense of free trade in general, and Turgot in particular, Sur les subsistances, links back to many earlier discussions of this vexed subject, including Mirabeau's L'Ami des hommes and Philosophie rurale, Young's Arithmétique politique and articles from Ephemerides du Citoyen, ou Bibliotheque Raisonee des Sciences Morales et Politiques.  

By integrating heterogenous collections, ranging from Revolutionary laws as enacted, to the debates and publications surrounding these events, to the practical handbooks and theoretical treatises, we can direct attention from the specific recommendations of an important committee to a much broader context.  Working with texts in this context opens the reader to multiple crosscurrents of topics and themes that can be unexpected and illuminating. 

Returning to where we started, in 1795, the Comité d'agriculture et des arts published Instruction sur l'emploi de la lie de vinwhich opens with: 

La lie de vin , rejettée dans plusieurs cantons de la France, comme un résidu sans valeur , peut cependant produire une quantité considérable de potasse utile pour les verreries, les savonneries et plusieurs antres arts , particulièrement pour la fabrication du salpêtre , le premier besoin de la liberté contre les efforts de la tyrannie.  

Here, the practical, theoretical and the political merge.  The twenty most similar documents include discussions of gunpowder, chemistry, manufacturing, and ranging from  Lavoisier's ARTICLE XII. De l'usage de la Potasse pour la fabrication du Salpêtre in Instruction sur l'établissement des nitrières et sur la fabrication du saltpêtre (1777) to Dulac's L'agonie de tous les tyrans, ou, Les moyens de fabriquer la foudre qui va les exterminer (1793).  

Read More

Club de la propagande

Leave a Comment


Over three decades ago, while working with the splendid French Revolution Collection (FRC) at the Newberry Library in Chicago, I came across one of those entertaining little finds that stick in your memory and makes working in great research library so worthwhile.  Came across is not quite right, since the librarians at the Newberry had begun working on a database catalog of the collection, starting with the anonymous texts from the FRC.  Searching for "Club" in this early database, which I recall was running on a stand-alone IBM-PC/AT from that epoch, generated a list of titles which included a document which I probably would not have found using standard printed catalogues such as Tourneux's Bibliographie... .   The Dénonciation a toutes les puissances de l'Europe : d'un plan de conjuration contre sa tranquilité général (link), is a right wing attack on the Société de 1789, a political club founded by Condorcet and Sieyès in 1790[1].  

What stuck in my mind for all these many years is the basis of the attack; that the "Club de la Propagande" was part of an American plan to destabilize the thrones of Europe with the ultimate objective of subjugating the old world to the new:

Elle qu'une légère vapeur qui s'élève du sein de la mer, comme le vestige d'un homme, attire du plus loin tous les nuages étendus dans l'air, se condense, s'obscurcit, & éclate enfin en une furieuse tempête ; tel on a vu le spectre pâle & maigre de l’insurrection, sortant d'une terre ingrate, & du milieu d'enfans rebelles parricides, croître & s'élever en un colonne fastueux, qui, posant un de ses pieds sur l'hémisphere qui l'enfanta, essaya de l'autre de franchir l'Océan, pour porter ses ravages sur celui-ci; & comme si l’Amérique avoir encore plus à se plaindre qu'à se louer de l'Europe, elle a envoyé l'anarchie à celle-ci, pour prix du soin qu’elle a pais de la civiliser.
    C'est elle qui est le berceau des convulsions qui commencent à agiter notre continent; c'est-là qu'est né le projet de soumettre l'ancien au nouveau monde ... (link)

Like all good political invective, there were some grains of truth to the attack.  The Société was no doubt rather pro-American in its orientation naming, for example,  Franklin, mentioned by name in the Dénonciation, as an honorary member.  The anonymous author identifies the root cause of all of the disorders in France and Europe are due to the contagion liberal ideas.

Les monstres! Ils ont égaré le peuple par deux mots l’ont toujours rendu la dupe des fourbes = égalité, & désobéissance = l’un, ils le lui on présenté comme un droit naturel. L’autre, comme,  un moyen légitime d’y rentrer. = II ne connoit  pas, ce malheureux peuple, le pouvoir magique  de ces deux mots, qui ont couvert la terre de crimes & de sang, qui ont rendu son séjour un objet d’horreur pour la vertu[?], & qui lui font, à la fin, désirer à lui-même un remede qu’il abhorre.
To insure that his readers were precisely able to identify the source of the conspiracy, the author attached a 10 page extract from  Sieyès'  Ébauche d'un nouveau plan de société patriotique, adopté par le Club de mil sept cent quatre-vingt-neuf  (BNF) which includes a discussion of l'art social as well as elements of the club's formal organization.  

The good folks at the Newberry produced a photocopy of this little treasure shortly after, which I squirreled away in my files and have kept, along with the charge slips and other notes, to this day.  Yes, I should provide seriously consider cleaning out the old paper files at one point.  

I had occasion to revisit this text several years ago, almost three decades after my first reading, in a completely different context.  In 2016-7, the Newberry Library made the entire collection available in digital format.  The release on Github consists of Library’s exceptional metadata describing each object, the OCR text data, and links to the digital facsimiles accessible from the Internet Archive, encouraging researchers and instructors to incorporate the digital collection in new kinds of scholarship and engagement.  In 2018, the ARTFL Project, in collaboration with the Newberry, released two versions of the collection under PhiloLogic4 (link).  The collection has also been extremely valuable as a corpus to test various new applications based on sequence alignment and machine learning.  In this course of this work, I was pleased to find the Dénonciation was indeed included in this collection.  

Part of our experimental work in developing the Intertextual Hub, is the deployment of various text mining and machine learning algorithms to a number of large heterogeneous collections.  As I was preparing a presentation on some of this work, I looked up the Dénonciation once more, to observe that the first topic listed in the citation is topic 34, the top words of which are: "election electeur nomination assemblee scrutin majorite elu choix membre votant" (accents removed).  Closer examination of the topic model for this document reveals pretty much the kinds of subjects that I had recalled:

With the notable exception of the first topic, number 34.  This unexpected topic sent me back to the text itself for the first time in decades, reminding me that significant parts of the Dénonciation contains an almost comically complex description of the election process of members taken from Sieyès'  Ébauche... . Here is just part of the involved process to elect members, the number of whom would be limited to 660:

Il est d'une bonne vue de donner au plus grand nombre possible des membres, la facilité de prendre part aux scrutins, afin qu'ils soient d'autant mieux le résultat de la volonté générale ; en conséquence on pourroit régler, que chaque scrutin se fera en quatre parties ; savoir, au premier & au deuxieme jours , & au quinze & au seize de chaque mois; de maniéré que le scrutin commence le matin du premier du mois ; par exemple , depuis onze heures jusqu'à midi , le soir pour ceux qui n'auroient pas pu se présenter le matin; le même scrutin continueroit le lendemain matin, ne se terminera que le, soir. Alors seulement on feroit le recensement. Pour prévenir les abus , il suffiroit que les feuilles de papier , remises aux membres fussent signées par un commissaire , qu'en recevant sa feuille , chaque membre s'inscrivit , ou fut inscrit par un commissaire; on connaîtroit par-là le nombre des feuilles données , ceux qui ont reçu la leur. Il faudrait encore que la boëte du scrutin fut fermée à clef, & qu’on ne pût en rien tirer jusqu’au moment du recensement.  (emphasis mine)

Trying to determine the "general will" just might well require such care and management of election procedures, but I have to admit that I wondered if I had missed the joke the first time around.  Was this a spoof of Condorcet's electoral combinatorics?  

Alas, you can't make this stuff up.  Or at least the author of the 
Dénonciation did not have to. The current version of the Intertextual Hub is based on a number of collections and the system provides two links to the original text by Sieyès. The Topic Model representation of the Dénonciation in the Hub
shows 2 parts of the Ebauche as being the top 2 most similar documents by a measure of vocabulary.  

It is followed by Condorcet's constitutional proposal of 1793.  The document read function of the Hub isolates numerous borrowed passages from the Ébauche

 The system allows the reader to compare two passages side by side to examine just how closely related they are.  

It is important to note that Sieyès' Ébauche is not part of the Newberry French Revolution collection, but is contained in the Goldsmiths-Kress collection of French works related to political economy.  The different techniques employed in our implementation of the Intertextual Hub, lexical density and sequence alignment, gave two different avenues to indicate the the two documents are related.  Being contained in different collections is important in itself. The Dénonciation does not have internal divisions (chapters or sections) while the  Ébauche does. Thus similar documents function from Dénonciation the does not find the Ébauche, because it is treated as parts of a document.  To find various potential points of contact between documents, we use various measures which are complementary and necessary, since we are trying to find relationships between items that are not all the same.  Thus, some of the complexity of the Hub is an artifact of treating huge numbers of heterogeneous documents.

My long, very intermittent, relationship with Dénonciation a toutes les puissances de l'Europe..., a minor text if ever there was one, is illustrative of the progress I believe we have seen over the last three decades in digital humanities. I first found it as part of an experimental bibliographic database in the late 1980s and able to access it only in person and store it as a photocopy. Decades later, it became a small part of an extraordinary collection, searchable as both excellent metadata and uncorrected OCR text. Our current work reflected in the Intertextual Hub, is to build and environment which can draw connections between documents across collections, using the power of distant reading tools to help navigate and elucidate closer considerations of even minor texts.

1   Mark Olsen, "A Failure of Enlightened Politics in the French Revolution: the Société de 1789" in French History 6 (1992): 303-34. (DOI)

Read More

Topic Models in the Intertextual Hub

1 comment

ARTFL’s NEH funded Intertextual Bridges project is an effort to facilitate distant and close readings across a large heterogeneous set of collections of 18th century French documents. These range from Revolutionary pamphlets and newspapers to the great works of Enlightenment in the original French as well as translations of many English texts. This post and associated slide show (see below), will provide an overview of the many ways which we attempt to use topic models as a way to search and navigation the collections. In two previous blog posts, Tracing Revolutionary Discourses
and Modeling Revolutionary Discourse, we provided an overview of some the development implementations and offered some initial observations arising from our use of topic models in this effort.  While the description of the procedures and implementation of both posts are reasonably current, we have made significant progress in the intervening months.  Thus, our discussion of Topic Models in this post builds upon our previous posts.  

The Intertextual Hub ( makes extensive use of Topic Models to provide search services, analytics and one form of document navigation[1].  This is an extension of the TopoLogic package which functions as an add-on to ARTFL's PhiloLogic4 text analysis system.   Topic Models are generated by invoking the ARTFL Text Preprocessing Library (ATPL), to extract metadata and word data from the standard representations generated by PhiloLogic4. This allows us to use PhiloLogic4 services to support navigation back to the text. The ATPL supports the treatment of files as either entire documents or as collections of sub-units depending on the available data markup and has a variety of NLP, normalization, and other parameters that can be adjusted for tasks such as Topic Modeling.  For Hub Topic Models, we use modernized unigram nouns longer than 2 letters.  These are directed to the TopoLogic generator which supports another layer of vector parameters, typically using NMF vectors with TF-IDF weightings.  For the primary topic model in the Hub, we selected to use 150 topics across all of the collections, which seem to give the best balance of reasonably coherent topics and number of obscure or meaningless topics.  In addition, we generated two Topic Models of 100 topics each using the same parameters based on documents from 1700-1788 and 1789-1799, which we believe will facilitate exploration of topics from each period. 

It is important to note that the tuning of Topic Models is based on selection and application of a large number of parameters, from number of topics to which words to use, which change the nature of the resulting topics significantly.  These judgements are based to a certain degree on what we expect to observe.  
For example, a topic which contains "citoyen patrie petition commune concitoyen secours moyen defenseur arrete magistrat" (accents removed) as the most heavily weighted terms, quite reasonably, as shown in the graph, is found to be most heavily weighted during the years of the Revolution.  This reliance on expected results, even though they may be perfectly reasonable, does point to a significant limitation of the approach.  Topic Models are extremely useful heuristics which can help summarize and navigate the contents of large collections, but should be used with due care as they can reflect parameter selection in ways that can skew results in various ways. 

The Intertextual Hub, offers several ways to use Topic Models.  From the top down, as it were, with the ability to navigate the collections starting with topics as well as the ability to select the top weighted terms from any of the 150 topics restricted by any available bibliographic data (dates, authors, collections, etc.) returning a list of documents (which may be parts of documents or entire texts depending on available encoding) ordered by relevance to the query.  Just as important, however, is the ability to identify the most important topics for any document and to find other texts that share the same topic distributions which is another way to measure how similar the documents are.  

As shown in the last few slides above, we have included two 100 topic Models derived using the same parameters from documents predating the Revolution and those from 1789-1799.  
These are both full installations of Topologic and not directly linked to the Intertextual Hub.   Users may block copy topic words from one Model and apply these to the full set of documents using the Search and Retrieval functions of the Hub. Some topics, such as 77 from the Revolutionary Model  (pont, canal, ingenieur, navigation, riviere, chaussee, travail, construction, reparation, devis), are probably not significantly different from the ancien régime considerations.  Other topics, however, are more clearly identified as having Revolutionary concerns.  Topic 46 of the Revolutionary 100 (election, scrutin, nomination, electeur, suffrage, majorite, liste, membre, votant, pluralite) reflect contemporary concerns.  Searching for this list of words in documents from 1700-1787 (run search), returns an interesting list of documents, the first six of which are chapters from La Rochefoucauld's Constitutions des treize États-Unis de l'Amérique (1783)

Running one's eye down the list of documents suggests suggests that the discourse regarding elections found its origins in a number of examples from England, the emerging US states, and some other European states.   There is also an interesting mix of well know names, Rousseau and Voltaire, authors who would become better known during the Revolution such as Brissot, and numerous less known writers.  

The Intertextual Hub is designed to offer potentially interesting texts to consider.  We employ Topic Models to provide granular search across the collections as well as to point to similar documents based on the current context.  Finally, we can track topics derived from documents of a later period, to early instances, potentially revealing connections that can offer new evaluations of these texts.  


[1] There is an extensive literature on the use of topic models in digital humanities including JDH 2012.  

Read More
Previous PostOlder Posts Home

Zett - A Responsive Blogger Theme, Lets Take your blog to the next level.

This is an example of a Optin Form, you could edit this to put information about yourself.

This is an example of a Optin Form, you could edit this to put information about yourself or your site so readers know where you are coming from. Find out more...

Following are the some of the Advantages of Opt-in Form :-

  • Easy to Setup and use.
  • It Can Generate more email subscribers.
  • It’s beautiful on every screen size (try resizing your browser!)