Some Notes on Theme-Rheme in PhiloLogic

Leave a Comment
One of the more arcane, and probably rarely used, functions in PhiloLogic is an experimental reporting scheme that I rather tentatively named "word in clause position analysis" or "theme-rheme," which is briefly described in the PhiloLogic user manual. I proposed this in talk titled "Making Space: Women's Writing in France, 1600-1950," which I gave at the ACH-ALLC and COCH/COSH conferences in 2004 (and drafted a good chunk of a paper about), and implemented in PhiloLogic around that time. Since we are now thinking of using this kind of analysis as a possible way to identify "interesting" or "illustrative" uses of words as part of another project, I thought it might be helpful to back-track a bit, give a bit more overview of how it works, outline some of the theoretical background as I understand it, and provide some useful links and papers.

As noted in the user manual entry, the "theme-rheme" function generates a standard concordance which it then attempts to sort out by where your search term occurs in a clause, where a clause is defined by punctuation. It segregates the occurrences by front of clause, back of clause, middle of clause, and instances where the clause is too short. By default, it displays only those occurrences that are clause initial. In the current implementation of ARTFL-Frantext a search for "tradition" results in 4,962 occurrences, which roughly break down as follows:

Front of Clause: 571 out of 4692 [12.16%] Avg. Clause length: 9.58
Last of Clause: 1056 out of 4692 [22.50%] Avg. Clause length: 8.68
Middle of Clause: 2348 out of 4692 [50.04%] Avg. Clause length: 9.56
Too Short: 717 out of 4692 [15.28%] Avg. Clause length: 2.40

The system further identifies specific documents in which your search term exceeds, by a certain percentage, the front of clause rate (in this case 12.16%), such as

55.55% (10/18): Montalembert, Charles Forbes, [1836], Histoire de Sainte Elisabeth de Hongrie, duchese de Thuringe...
28.20% (11/39): Bossuet, Jacques Bénigne, 1627-1704. [1681], Discours sur l'histoire universelle

and it, of course, displays these in different colors, such as:

  • L'Europe ainsi déracinée s'est plus tard déracinée davantage en se séparant, dans une large mesure, de la tradition chrétienne elle-même sans pouvoir renouer aucun lien spirituel avec l'Antiquité.
  • Oui, sans doute, si cette tradition était tout entière dans Aristote et dans l'enseignement péripatéticien de la scolastique.
  • La tradition attribue à Pythagore un séjour à Babylone.
The basic notion is that clause initial instances of words are probably more important, since they tend to be the "subject"of the rest of the clause. And authors who tend to use your favorite word in more clause initial positions than is average, might be doing something of particular note. In other words, can we use the machine to try to isolate, from the thousands of hits, those that might be particularly noteworthy. In this case, we have isolated a small subset (12%) of the occurrences of "tradition" in a clause initial position and some authors/documents who tend to privilege this word. I also identified clause ending uses, since (I suspect) end of clause words provide a bridge to the next clause (or sentence).

I set two "intertwingled" problems in the paper, women's writing and, more salient to this post, the increasing need to arrive at high orders of generalization to make sense of the results coming from ever increasing datasets. Obviously, one solution to this is work we have been doing over the last few years in the areas of machine learning, document summarization, and text data mining (see PhiloMine and related papers). What I proposed in this paper was a move toward from traditional text analysis techniques towards analytical notions based on functional linguistics or functional grammar, which are related in various ways to text linguistics or discourse analysis. This is a huge area of work and I would not begin to characterize it. Helma, of course, is a functional linguist and proposes that this is a branch of "linguistics that takes the communicative functions of language as primary as opposed to seeing form as primary." And as you might imagine, there are schools and competing views. I have to admit I like the name "West Coast Functionalists. :-)

My take on this is that meaning arises from choices, or chains of choices, with sets of goals and objectives. I also suspect that many "functionalists" would agree on a few other basic notions, such as lexis and grammar are inseparable in meaning creation, and indeed the term "‘lexico-grammar’ is now often used in recognition of the fact that lexis and grammar are not separate and discrete, but form a continuum." (cite) It also appears that many functionalists would agree with the notion that the clause is the building block unit. There are probably other points of general agreement about just how different layers might work or be defined. For example, Simon Dik (not related to Helma) identified three layers in his Functional Grammar:
  • SEMANTIC FUNCTIONS (Agent, Patient, Recipient, etc.) which define the roles that participants play in states of affairs, as designated by predications.
  • SYNTACTIC FUNCTIONS (Subject and Object) which define different perspectives through which states of affairs are presented in linguistic expressions.
  • PRAGMATIC FUNCTIONS (Theme and Tail, Topic and Focus) which define the informational status of constituents of linguistic expressions. They relate to the embedding of the expression in the ongoing discourse, that is, are determined by the status of the pragmatic information of Speaker and Addressee as it developes in verbal interaction.


Of course, other folks will carve these things up differently. Robert de Beaugrande, whose extensive web site and papers are well worth the visit, represents the various levels of functional linguistics from nerves to text, as outline in the image, taken from his "Functionalism and Corpus Linguistics in the ‘Next Generation." In another paper, he argues "Corpus data are so eminently suited to informing us about 'networks' because they offer concrete displays of the constraints upon how sets of choices can interact. In the 'lexicon' part of the 'lexicogrammar' of English, these constraints constitute the collocability in the virtual system, and the textual actualisations are the lexical collocations. In the 'grammar' part of the 'lexicogrammar', these constraints constitute the colligability in the virtual system, and the textual actualisations are the grammatical colligations" and goes on to represent the following image the series of "dialectics" running between text and language.



Ok, they are fun images ... now back to work... and I wanted to see how embedding images would work...

It is the level of pragmatics that I suspect interests us in this particular case. As I noted above, I borrowed the "theme-rheme" nominclature from MAK Halliday's Introduction to Functional Linguistics. Again:

Theme: "starting point of the message, what the clause is going to be about".
Rheme: everything not the Theme: new information/material

Theme contains given information i.e. information which has already been mentioned somewhere in the text, or is familar from the context. There is an accessible description of this, with some nice examples in Theme and Rheme in the Thematic Organization of Text.

In English (and French), identification of the Theme is based primarily on word order. Thus, the theme is the element which comes first in the clause. (Eggins, An Introduction to Systemtic Functional Linguistics, p. 275) Plenty of problems identifying the exact boundaries of different kinds of themes.

The take way point, from all of this, is that the theme/rheme distinction is important because it is the way you get thematic development across a longer span of text. Obviously, the Rheme in one clause can become Theme in the next.

One other take away: Halliday makes the argument that one can use punctuation in written texts to identify clauses, which is not the same for spoken texts.

More later????? I can track down a few more bibliographic entries....


Next PostNewer Post Previous PostOlder Post Home

0 comments:

Post a Comment