As noted in the user manual entry, the "theme-rheme" function generates a standard concordance which it then attempts to sort out by where your search term occurs in a clause, where a clause is defined by punctuation. It segregates the occurrences by front of clause, back of clause, middle of clause, and instances where the clause is too short. By default, it displays only those occurrences that are clause initial. In the current implementation of ARTFL-Frantext a search for "tradition" results in 4,962 occurrences, which roughly break down as follows:
Front of Clause: 571 out of 4692 [12.16%] Avg. Clause length: 9.58
Last of Clause: 1056 out of 4692 [22.50%] Avg. Clause length: 8.68
Middle of Clause: 2348 out of 4692 [50.04%] Avg. Clause length: 9.56
Too Short: 717 out of 4692 [15.28%] Avg. Clause length: 2.40
The system further identifies specific documents in which your search term exceeds, by a certain percentage, the front of clause rate (in this case 12.16%), such as
55.55% (10/18): Montalembert, Charles Forbes, , Histoire de Sainte Elisabeth de Hongrie, duchese de Thuringe...
28.20% (11/39): Bossuet, Jacques Bénigne, 1627-1704. , Discours sur l'histoire universelle
and it, of course, displays these in different colors, such as:
- L'Europe ainsi déracinée s'est plus tard déracinée davantage en se séparant, dans une large mesure, de la tradition chrétienne elle-même sans pouvoir renouer aucun lien spirituel avec l'Antiquité.
- Oui, sans doute, si cette tradition était tout entière dans Aristote et dans l'enseignement péripatéticien de la scolastique.
- La tradition attribue à Pythagore un séjour à Babylone.
I set two "intertwingled" problems in the paper, women's writing and, more salient to this post, the increasing need to arrive at high orders of generalization to make sense of the results coming from ever increasing datasets. Obviously, one solution to this is work we have been doing over the last few years in the areas of machine learning, document summarization, and text data mining (see PhiloMine and related papers). What I proposed in this paper was a move toward from traditional text analysis techniques towards analytical notions based on functional linguistics or functional grammar, which are related in various ways to text linguistics or discourse analysis. This is a huge area of work and I would not begin to characterize it. Helma, of course, is a functional linguist and proposes that this is a branch of "linguistics that takes the communicative functions of language as primary as opposed to seeing form as primary." And as you might imagine, there are schools and competing views. I have to admit I like the name "West Coast Functionalists. :-)
My take on this is that meaning arises from choices, or chains of choices, with sets of goals and objectives. I also suspect that many "functionalists" would agree on a few other basic notions, such as lexis and grammar are inseparable in meaning creation, and indeed the term "‘lexico-grammar’ is now often used in recognition of the fact that lexis and grammar are not separate and discrete, but form a continuum." (cite) It also appears that many functionalists would agree with the notion that the clause is the building block unit. There are probably other points of general agreement about just how different layers might work or be defined. For example, Simon Dik (not related to Helma) identified three layers in his Functional Grammar:
- SEMANTIC FUNCTIONS (Agent, Patient, Recipient, etc.) which define the roles that participants play in states of affairs, as designated by predications.
- SYNTACTIC FUNCTIONS (Subject and Object) which define different perspectives through which states of affairs are presented in linguistic expressions.
- PRAGMATIC FUNCTIONS (Theme and Tail, Topic and Focus) which define the informational status of constituents of linguistic expressions. They relate to the embedding of the expression in the ongoing discourse, that is, are determined by the status of the pragmatic information of Speaker and Addressee as it developes in verbal interaction.
Of course, other folks will carve these things up differently. Robert de Beaugrande, whose extensive web site and papers are well worth the visit, represents the various levels of functional linguistics from nerves to text, as outline in the image, taken from his "Functionalism and Corpus Linguistics in the ‘Next Generation." In another paper, he argues "Corpus data are so eminently suited to informing us about 'networks' because they offer concrete displays of the constraints upon how sets of choices can interact. In the 'lexicon' part of the 'lexicogrammar' of English, these constraints constitute the collocability in the virtual system, and the textual actualisations are the lexical collocations. In the 'grammar' part of the 'lexicogrammar', these constraints constitute the colligability in the virtual system, and the textual actualisations are the grammatical colligations" and goes on to represent the following image the series of "dialectics" running between text and language.
Ok, they are fun images ... now back to work... and I wanted to see how embedding images would work...
It is the level of pragmatics that I suspect interests us in this particular case. As I noted above, I borrowed the "theme-rheme" nominclature from MAK Halliday's Introduction to Functional Linguistics. Again:
Theme: "starting point of the message, what the clause is going to be about".
Rheme: everything not the Theme: new information/material
In English (and French), identification of the Theme is based primarily on word order. Thus, the theme is the element which comes first in the clause. (Eggins, An Introduction to Systemtic Functional Linguistics, p. 275) Plenty of problems identifying the exact boundaries of different kinds of themes.
The take way point, from all of this, is that the theme/rheme distinction is important because it is the way you get thematic development across a longer span of text. Obviously, the Rheme in one clause can become Theme in the next.
One other take away: Halliday makes the argument that one can use punctuation in written texts to identify clauses, which is not the same for spoken texts.
More later????? I can track down a few more bibliographic entries....